When I upgrade an IPA replica before the master, I don't see selinuxusermap entries added on master replicated to replica.
Version-Release number of selected component (if applicable): 389-ds-base-1.2.11.15-11.el6.x86_64 ipa-server-3.0.0-26.el6_4.2.x86_64
Steps to Reproduce:
Actual results: Not able to see selinuxusermap entried on replica that was created on master.
[root@ipaqavmc slapd-TESTRELM-COM]# ipa selinuxusermap-show serule1 ipa: ERROR: serule1: SELinux User Map rule not found
Expected results: Should see it on replica, same as master:
[root@ipaqavmb slapd-TESTRELM-COM]# ipa selinuxusermap-show serule1 Rule name: serule1 SELinux User: staff_u:s0-s0:c0.c1023 Host category: all Enabled: TRUE Users: jordan
The directory server supplier is sending over the same changes twice. The reason is because the RUV returned from the consumer (the Consumer RUV in the supplier error log) is bogus - the RUV element for the supplier (rid 4) is empty and even has the wrong port number in the pURL. The RUV element for the supplier should contain the max CSN of the most recent changes sent over.
This is a case of a duplicate ADD - the entries were added directly to the replica earlier:
[14/May/2013:12:57:00 -0400] conn=7 op=25 ADD dn="cn=selinux,dc=testrelm,dc=com" [14/May/2013:12:57:00 -0400] conn=7 op=25 RESULT err=0 tag=105 nentries=0 etime=0 csn=51926cdd000000030000
The replica was unable to send this change to the master because there was a problem with replication:
[14/May/2013:12:57:00 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error) [14/May/2013:12:57:00 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot determine realm for numeric host address))
replication resumes: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Replication bind with GSSAPI auth resumed schema repl issue: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Warning: unable to replicate schema: rc=1
The above add, and several other changes, appear to be missing from the supplier RUV:
[14/May/2013:12:59:12 -0400] - _cl5PositionCursorForReplay (agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389)): Supplier RUV: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replicageneration} 51926881000000040000 [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replica 3 ldap://ipaqavma.testrelm.com:389} 51926d51000000030000 51926d61000000030000 51926d60
Note that the min csn in the RUV element for the replica (rid 3) is 51926d51000000030000, which is greater than 51926cdd000000030000. But the consumer has this:
[14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replica 3 ldap://ipaqavma.testrelm.com:389} 51926887000800030000 51926cc4000000030000 00000000
51926cc4000000030000 is less than 51926cdd000000030000, so the master has not seen that change yet.
The replica attempts to replay these changes to the master: [14/May/2013:12:59:12 -0400] agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389) - session start: anchorcsn=51926cc4000000030000 [14/May/2013:12:59:12 -0400] agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389) - clcache_load_buffer: rc=-30988 [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - changelog program - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): CSN 51926aa6000000040000 not found and no purging, probably a reinit
So, for some reason, the replica doesn't have 51926cc4000000030000. This is a change which originated on the replica (rid 3) - not sure why it isn't found. Since it isn't found, it tries to use the min csn from the supplier, which is 51926d51000000030000, which skips the changes made earlier.
This causes the changelog to be wiped out: {{{ [14/May/2013:12:56:50 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [51926cc4000000030000] from RUV [changelog max RUV] is larger than the max CSN [] from RUV [database RUV] for element [{replica 3} 51926887000800030000 51926cc4000000030000] [14/May/2013:12:56:50 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=testrelm,dc=com does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. }}} I have no idea why the database RUV is empty. The RUV element in this case is from the changelog RUV.
The ruv element for replica4 seems to be changing in two steps
1] changing the port to 0 [14/May/2013:13:10:07 -0400] NSMMReplicationPlugin - agmt="cn=meToipaqavma.testrelm.com" (ipaqavma:389): {replica 4 ldap://qe-blade-09.testrelm.com:0} 51926fef000000040000 51926ff0000400040000 51926fef
2] removing the csns [14/May/2013:13:10:13 -0400] NSMMReplicationPlugin - agmt="cn=meToipaqavma.testrelm.com" (ipaqavma:389): {replica 4 ldap://qe-blade-09.testrelm.com:0}
ruv_compare_ruv seems to assume that the ruv elements are in the same order, maybe it compares 3 to empty 4.
Why does the port get changed to 0 ? I remember this could happen when a replica is demoted from master to hub/consumer, but couldn't find that in RHDS
I can reproduce the behavior, and it does seem to be related to the fact that the nsslapd-port is changed to 0. MMR constructs a local purl (Partial URL) like this: {{{ multimaster_set_local_purl() local_purl = slapi_ch_smprintf("ldap://%s:%s", config_get_localhost(), config_get_port()); }}} since port is 0, the purl looks like "ldap://hostname:0" The code in ruv_init_from_slapi_attr_and_check_purl() tries to make sure the purl in the RUV element matches the server's local purl. In this case, it doesn't match any more, since the port number has changed, so the RUV element is reset, and the min and max csn are wiped out.
I think in this case, we should check only the hostname, not the port number.
0001-additional-RUV-debugging.patch 0001-additional-RUV-debugging.patch
0002-Ticket-47362-ipa-upgrade-selinuxusermap-data-not-rep.patch 0002-Ticket-47362-ipa-upgrade-selinuxusermap-data-not-rep.patch
76c87bd..0c194eb 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 0c194eb Author: Rich Megginson rmeggins@redhat.com Date: Wed May 15 19:39:24 2013 -0600 ce102a9..2777aef 389-ds-base-1.3.0 -> 389-ds-base-1.3.0 commit 2777aef Author: Rich Megginson rmeggins@redhat.com Date: Wed May 15 19:39:24 2013 -0600 3da40b4..2909b17 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit 2909b17 Author: Rich Megginson rmeggins@redhat.com Date: Wed May 15 19:39:24 2013 -0600 f2b5a97..6236d7a master -> master commit 6236d7a Author: Rich Megginson rmeggins@redhat.com Date: Wed May 15 19:39:24 2013 -0600
Metadata Update from @rmeggins: - Issue assigned to rmeggins - Issue set to the milestone: 1.2.11.22
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/699
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.