#50880 The protocol ext release being processed too early while ext acquire have not yet complete its task
Closed: wontfix 5 days ago by tbordaz. Opened 5 months ago by pcech.

This issue was discovered by FreeIPA nightly test failure. Please, see https://pagure.io/freeipa/issue/8182 for more info.

First investigation is https://pagure.io/freeipa/issue/8182#comment-623287

  • The failure looks rare (not a frequent message)

  • The failure had no impact on replication
    The failure to release the suffix occurred 02:32:11.039, but the suffix was acquired 02:32:11.033->02:32:11.04

A question is which suffix was acquired domain/ipaca ?

 [25/Jan/2020:02:32:11.033302037 +0000] conn=444 op=5 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
 [25/Jan/2020:02:32:11.038152131 +0000] conn=444 op=6 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
 [25/Jan/2020:02:32:11.039289247 +0000] conn=444 op=6 RESULT err=2 tag=120 nentries=0 etime=0.001146522
 [25/Jan/2020:02:32:11.040436858 +0000] conn=444 op=5 RESULT err=0 tag=120 nentries=0 etime=0.007345663

At the time of the failure dc=ipa,dc=test was acquired 02:32:10.92->02:32:11.19, so the failure occurred on ipaca

[25/Jan/2020:02:31:40.981091375 +0000] conn=421 fd=118 slot=118 connection from to
[25/Jan/2020:02:31:41.027229695 +0000] conn=421 op=2 RESULT err=0 tag=97 nentries=0 etime=0.003588447 dn="krbprincipalname=ldap/replica1.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test"
[25/Jan/2020:02:32:10.927202273 +0000] conn=421 op=14 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[25/Jan/2020:02:32:10.928366561 +0000] conn=421 op=14 RESULT err=0 tag=120 nentries=0 etime=0.001391878
[25/Jan/2020:02:32:10.996475150 +0000] conn=421 op=16 ADD dn="dnaHostname=replica1.ipa.test+dnaPortNum=389,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=ipa,dc=test"
[25/Jan/2020:02:32:11.032139663 +0000] conn=421 op=16 RESULT err=0 tag=105 nentries=0 etime=0.036019162 csn=5e2ba8ac000200090000
[25/Jan/2020:02:32:11.192339706 +0000] conn=421 op=18 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
[25/Jan/2020:02:32:11.196066343 +0000] conn=421 op=18 RESULT err=0 tag=120 nentries=0 etime=0.004009059

After failure, we can see that an incoming replication session (from replica1->master) can acquire o=ipaca

[25/Jan/2020:02:32:10.007959175 +0000] conn=441 fd=173 slot=173 connection from to
[25/Jan/2020:02:32:10.056260639 +0000] conn=441 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
[25/Jan/2020:02:32:10.059199064 +0000] conn=441 op=2 RESULT err=0 tag=97 nentries=0 etime=0.003070175 dn="krbprincipalname=ldap/replica1.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test"
[25/Jan/2020:02:33:10.470728885 +0000] conn=441 op=24 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[25/Jan/2020:02:33:10.474175754 +0000] conn=441 op=24 RESULT err=0 tag=120 nentries=0 etime=0.003966626
[25/Jan/2020:02:33:10.484083780 +0000] conn=441 op=25 ADD dn="cn=2566678943258731036,ou=sessions,ou=Security Domain,o=ipaca"
[25/Jan/2020:02:33:10.501158914 +0000] conn=441 op=25 RESULT err=0 tag=105 nentries=0 etime=0.018178451 csn=5e2ba8d60000000a0000
[25/Jan/2020:02:33:10.590615003 +0000] conn=441 op=26 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
[25/Jan/2020:02:33:10.598961726 +0000] conn=441 op=26 RESULT err=0 tag=120 nentries=0 etime=0.008709392
  • In conclusion
    I think the bug is real but has limited impact (only logged message) and occurs rarely
    A possible explanation is incoming replication connection can be poll for the next operation too early regarding the suffix status

Metadata Update from @tbordaz:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

5 months ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.4

5 months ago

Looking at test_replication_layouts_TestCompleteTopologyWithCAKRA (failure replica1 install)

Replica1 (no DS error on Replica1)
2020-01-25T02:32:11Z DEBUG Starting external process
2020-01-25T02:32:11Z DEBUG args=['/usr/sbin/pkispawn', '-s', 'KRA', '-f', '/tmp/tmp53o7t0bf']
2020-01-25T02:38:00Z DEBUG Process finished, return code=1
2020-01-25T02:38:00Z DEBUG stderr=WARNING: cert caSigningCert cert-pki-ca already exists

2020-01-25T02:38:00Z CRITICAL Failed to configure KRA instance: CalledProcessError(Command ['/usr/sbin/pkispawn', '-s', 'KRA', '-f', '/tmp/tmp53o7t0bf'] returned non-zero exit status 1: 'WARNING: cert caSigningCert cert-pki-ca already exists\n')

On Master (we can see pkispaw session)
[25/Jan/2020:02:32:11.089951353 +0000] conn=442 op=2 BIND dn="uid=admin-replica1.ipa.test,ou=people,o=ipaca" method=128 version=3
[25/Jan/2020:02:32:11.290244318 +0000] conn=442 op=2 RESULT err=0 tag=97 nentries=0 etime=0.200780557 dn="uid=admin-replica1.ipa.test,ou=people,o=ipaca"
[25/Jan/2020:02:32:11.295458004 +0000] conn=442 op=3 SRCH base="cn=Enterprise CA Administrators,ou=groups,o=ipaca" scope=0 filter="(uniqueMember=uid=admin-replica1.ipa.test,ou=people,o=ipaca)" attrs="uniqueMember"
[25/Jan/2020:02:32:11.296344468 +0000] conn=442 op=3 RESULT err=0 tag=101 nentries=1 etime=0.001160642
[25/Jan/2020:02:32:11.301207331 +0000] conn=442 op=4 SRCH base="cn=Enterprise KRA Administrators,ou=groups,o=ipaca" scope=0 filter="(uniqueMember=uid=admin-replica1.ipa.test,ou=people,o=ipaca)" attrs="uniqueMember"
[25/Jan/2020:02:32:11.301619087 +0000] conn=442 op=4 RESULT err=0 tag=101 nentries=1 etime=0.000636685
[25/Jan/2020:02:32:11.305161800 +0000] conn=442 op=5 SRCH base="cn=Security Domain Administrators,ou=groups,o=ipaca" scope=0 filter="(uniqueMember=uid=admin-replica1.ipa.test,ou=people,o=ipaca)" attrs="uniqueMember"
[25/Jan/2020:02:32:11.305501221 +0000] conn=442 op=5 RESULT err=0 tag=101 nentries=1 etime=0.000497619
[25/Jan/2020:02:37:59.771657806 +0000] conn=442 op=6 UNBIND
[25/Jan/2020:02:37:59.771717968 +0000] conn=442 op=6 fd=174 closed - U1

During the pkispawn run and failure there is no DS error. @pcech , @frenaud why do you think DS could contribute to ipa-replica-install failure ?

The issue did not re-occur recently in freeipa nightly tests, please see https://pagure.io/freeipa/issue/8182#comment-662558. The freeipa ticket has been closed, so feel free to also close this ticket if you don't have enough info to work on it.

@frenaud thanks for the feedback. I have no explanation why a replication agreement sends a end_session while having not received the start_session (all in the same second). In any case this should have no impact on replication. The related freeipa ticket being close, I will close this one as well.

Metadata Update from @tbordaz:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

5 days ago

Login to comment on this ticket.