#48837 Replication: total init aborted
Closed: wontfix None Opened 7 years ago by tbordaz.

Testing with

The test case is: Install a ipa-server, install a replica (client then replica).

The logs are:

Supplier
[13/May/2016:14:24:43.942901892 +0200] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=meTovm-fqdn>" (vm:389)".
[13/May/2016:14:24:43.946442985 +0200] NSMMReplicationPlugin - Need to create replication keep alive entry <cn=repl keep alive 4,<suffix>>
[13/May/2016:14:24:43.948395482 +0200] NSMMReplicationPlugin - add dn: cn=repl keep alive 4,<suffix>
objectclass: top
objectclass: ldapsubentry
objectclass: extensibleObject
cn: repl keep alive 4
[13/May/2016:14:24:43.962849222 +0200] NSMMReplicationPlugin - agmt="cn=meTo<vm-fqdn>" (vm:389): Received error 2 (Protocol error):  for total update operation
[13/May/2016:14:24:43.964056847 +0200] NSMMReplicationPlugin - agmt="cn=meTovm-fqdn>" (vm:389): Received error 2 (Protocol error):  for total update operation
[13/May/2016:14:24:43.965257002 +0200] NSMMReplicationPlugin - agmt="cn=meTovm-fqdn>" (vm:389): Warning: unable to send endReplication extended operation (Protocol error)

Replica
...
[13/May/2016:14:24:43.244656584 +0200] ipa-topology-plugin - ipa_topo_be_state_changebackend userRoot is going offline; inactivate plugin
[13/May/2016:14:24:43.254646244 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica <suffix> is going offline; disabling replication
[13/May/2016:14:24:43.377911430 +0200] WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[13/May/2016:14:24:43.967685367 +0200] ERROR bulk import abandoned
[13/May/2016:14:24:43.986159599 +0200] import userRoot: Thread monitoring returned: -23

[13/May/2016:14:24:43.988339717 +0200] import userRoot: Aborting all Import threads...
[13/May/2016:14:24:52.000458046 +0200] import userRoot: Import threads aborted.
[13/May/2016:14:24:52.003123755 +0200] import userRoot: Closing files...
[13/May/2016:14:24:52.007013515 +0200] import userRoot: Import failed.
[13/May/2016:14:24:52.008702760 +0200] process_bulk_import_op: NULL target sdn


[13/May/2016:14:24:43.227507390 +0200] conn=4 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
...
[13/May/2016:14:24:43.236652974 +0200] conn=4 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=ldap/<vm-fqdn>@domain>,cn=config"
[13/May/2016:14:24:43.237065786 +0200] conn=4 op=3 SRCH base="" scope=0 filter="(objectClass=*)" attrs="supportedControl supportedExtension"
...
[13/May/2016:14:24:43.960003625 +0200] conn=4 op=9 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop"
[13/May/2016:14:24:43.963550368 +0200] conn=4 op=9 RESULT err=2 tag=120 nentries=0 etime=0
..
[13/May/2016:14:24:43.967436084 +0200] conn=4 op=17 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop"
[13/May/2016:14:24:43.967532226 +0200] conn=4 op=17 fd=66 closed - B4


2016-05-13T12:24:44Z DEBUG   [error] RuntimeError: Failed to start replication

Not certain yet, but we need to revisit the changes on plugin_call_exop_plugins made by these patches...

commit b57fe64
Author: William Brown firstyear@redhat.com
Date: Tue Apr 26 18:07:57 2016 +1000
Ticket 48770 - Improve extended op plugin handling

commit d5589b5
Author: William Brown firstyear@redhat.com
Date: Thu Apr 28 10:05:20 2016 +1000
Ticket 48770 - Improve extended op plugin handling

commit 2ecc937
Author: Noriko Hosoi nhosoi@redhat.com
Date: Wed May 4 14:44:30 2016 -0700
Ticket #48822 - (389-ds-base-1.3.5) Fixing coverity issues.

Regarding the coverity fix (commit 2ecc937), if it is wrong, then the logic before the coverity fix is wrong, too. The if expression "(lderr == SLAPI_PLUGIN_EXTENDED_NOT_HANDLED || rc != LDAP_SUCCESS)" is not needed at all, and this assignment is always expected "lderr = rc;" If that's the only an issue, we can fix it as such. But let's review the plugin.c diff one more time.

{{{
diff --git a/ldap/servers/slapd/plugin.c b/ldap/servers/slapd/plugin.c
index 5b81779..f196d2c 100644
--- a/ldap/servers/slapd/plugin.c
+++ b/ldap/servers/slapd/plugin.c
@@ -538,7 +538,7 @@ plugin_call_exop_plugins( Slapi_PBlock pb, struct slapdplugin p )
/
* simple merge: report last real error
/
- if ( lderr == SLAPI_PLUGIN_EXTENDED_NOT_HANDLED || rc != LDAP_SUCCESS ) {
+ if ( rc != LDAP_SUCCESS ) {
lderr = rc;
}
}

}}}

I successfully tested master branch + https://fedorahosted.org/389/ticket/48836 - 2ecc937, IMHO 2ecc937 is not neutral.

Reviewing the fix. If for example (*p->plg_exhandler)( pb ) returns LDAP_SUCCESS, then plugin_call_exop_plugins will return SLAPI_PLUGIN_EXTENDED_NOT_HANDLED where it use to return LDAP_SUCCESS.

Replying to [comment:5 tbordaz]:

IMHO 2ecc937 is not neutral.

Reviewing the fix. If for example (*p->plg_exhandler)( pb ) returns LDAP_SUCCESS, then plugin_call_exop_plugins will return SLAPI_PLUGIN_EXTENDED_NOT_HANDLED where it use to return LDAP_SUCCESS.

you're right. But it could probably be made simpler by just setting lderr = rc in the else branch

Replying to [comment:6 lkrispen]:

Replying to [comment:5 tbordaz]:

IMHO 2ecc937 is not neutral.

Reviewing the fix. If for example (*p->plg_exhandler)( pb ) returns LDAP_SUCCESS, then plugin_call_exop_plugins will return SLAPI_PLUGIN_EXTENDED_NOT_HANDLED where it use to return LDAP_SUCCESS.

you're right. But it could probably be made simpler by just setting lderr = rc in the else branch

Great! Probably, we should build 1.3.5.4 with your fix, Thierry. (maybe with the idea from Ludwig.) Thierry, could you please make a patch and push it since both Ludwig and noriko acked it proactively? ;)

Since Ludwig and Thierry are in the long holiday, I'm revisiting this ticket...

It looks to me plugin_call_exop_plugins should always return the return code rc from (*p->plg_exhandler)( pb )...

My observation:
If (p->plg_exhandler)(pb) returns SLAPI_PLUGIN_EXTENDED_SENT_RESULT, the return value rc is returned.
If rc is not SLAPI_PLUGIN_EXTENDED_NOT_HANDLED, rc used to be always returned due to the DEADCODE. I changed it only when rc is not LDAP_SUCCESS, then it broke the total update. That is, if rc is not SLAPI_PLUGIN_EXTENDED_NOT_HANDLED, rc should be returned.
If rc is SLAPI_PLUGIN_EXTENDED_NOT_HANDLED, SLAPI_PLUGIN_EXTENDED_NOT_HANDLED is returned.

That being said, my proposal is ...
{{{
int
plugin_call_exop_plugins( Slapi_PBlock pb, struct slapdplugin p )
{
int rc;
slapi_pblock_set( pb, SLAPI_PLUGIN, p );
set_db_default_result_handlers( pb );
rc = (*p->plg_exhandler)( pb );
return (rc);
}
}}}
Am I missing something?

that was my conclusion as well, coverty was right,there was a lot of dead code, and alwast the result of the plugin call was returned

Well, given that now, that one, and only one exop plugin can handle the action, having a single result from the plugin is probably the right thing to do here. This is a good change.

Unfortunately, still the patch 0001-Ticket-48837-Replication-total-init-aborted.patch​ added has a problem. It issues this message for every single entry to be sent in the total update.

[16/May/2016:11:51:23.720833103 -0700] extendop.c failed with result 0
[16/May/2016:11:51:23.726204429 -0700] extendop.c failed with result 0

If the return value of (p->plg_exhandler)( pb ) is 0 (== LDAP_SUCCESS), should it be converted to SLAPI_PLUGIN_EXTENDED_SENT_RESULT something like this?
{{{
diff --git a/ldap/servers/slapd/plugin.c b/ldap/servers/slapd/plugin.c
index 3c32c55..a4e4de4 100644
--- a/ldap/servers/slapd/plugin.c
+++ b/ldap/servers/slapd/plugin.c
@@ -531,7 +531,10 @@ plugin_call_exop_plugins( Slapi_PBlock
pb, struct slapdplugin p )
slapi_pblock_set( pb, SLAPI_PLUGIN, p );
set_db_default_result_handlers( pb );
rc = (
p->plg_exhandler)( pb );
- return (rc);
+ if (!rc) { / LDAP_SUCCESS needs to translated to SLAPI_PLUGIN_EXTENDED_SENT_RESULT? /
+ rc = SLAPI_PLUGIN_EXTENDED_SENT_RESULT;
+ }
+ return (rc);
}
}}}
If we do this, then the above "extendop.c failed with result 0" is not logged...

Well, the plugin should be setting SLAPI_PLUGIN_EXTENDED_SENT_RESULT I thought?

dna.c
{{{
4418 ret = SLAPI_PLUGIN_EXTENDED_SENT_RESULT;
}}}

So is this actually revealing an issue in the plugin?

Also, that solution, what happens if the rc is SLAPI_PLUGIN_EXTENDED_SENT_RESULT? Won't this now cause the plugin to return SENT_RESULT when the plugin fails and it returns a PLUGIN_NOT_HANDLED?

perhaps this is the solution ....

{{{
if (rc == LDAP_SUCCESS) {
rc = SLAPI_PLUGIN_EXTENDED_SENT_RESULT;
}
}}}

The plugin which is having the issue is the consumer initialization (aka total update, bulk import, etc. :). Should the plugin be modified to return SLAPI_PLUGIN_EXTENDED_SENT_RESULT in case of success? I think there's no problem to do so.

But probably, it's nicer to allow the plugins to return LDAP_SUCCESS, IMO...

I don't really mind. But to make the change to allow LDAP_SUCCESS, your patch needs to explicitly look for LDAP_SUCCESS for the conversion, it cannot check for (!rc).

That seems logically okay to me. I have not built or tested it, but code wise I will ack.

Thanks to Ludwig and William for their reviews and comments.

Pushed to master:
9eee3a8..e6ba94f master -> master
commit e6ba94f

Fix e6ba94f is successfully tested (description test case) with https://fedorahosted.org/389/ticket/48836 (master+48836)

Metadata Update from @nhosoi:
- Issue set to the milestone: 1.3.5.4

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1897

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata