The nightly test test_integration/test_replication_layouts.py::TestCompleteTopologyWithCAKRA::test_complete_topology_with_ca_kra occasionally fails, see logs in PR #4159.
test_integration/test_replication_layouts.py::TestCompleteTopologyWithCAKRA::test_complete_topology_with_ca_kra
The test failed while installing KRA on the 3rd replica.
self = <ipatests.test_integration.test_replication_layouts.TestCompleteTopologyWithCAKRA object at 0x7f181e0657d0> def test_complete_topology_with_ca_kra(self): tasks.install_topo('complete', self.master, self.replicas, [], > setup_replica_cas=True, setup_replica_kras=True) test_integration/test_replication_layouts.py:121: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pytest_ipa/integration/tasks.py:1283: in install_topo setup_kra=setup_replica_kras pytest_ipa/integration/tasks.py:515: in install_replica stdin_text=stdin_text) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <ipatests.pytest_ipa.integration.host.Host replica1.ipa.test (replica)> argv = ['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', ...] set_env = True, stdin_text = None, log_stdout = True, raiseonerr = True cwd = None, bg = False, encoding = 'utf-8', ok_returncode = 0 def run_command(self, argv, set_env=True, stdin_text=None, log_stdout=True, raiseonerr=True, cwd=None, bg=False, encoding='utf-8', ok_returncode=0): """Wrapper around run_command to log stderr on raiseonerr=True :param ok_returncode: return code considered to be correct, you can pass an integer or sequence of integers """ result = super().run_command( argv, set_env=set_env, stdin_text=stdin_text, log_stdout=log_stdout, raiseonerr=False, cwd=cwd, bg=bg, encoding=encoding ) # in FIPS mode SSH may print noise to stderr, remove the string # "FIPS mode initialized" + optional newline. result.stderr_bytes = FIPS_NOISE_RE.sub(b'', result.stderr_bytes) try: result_ok = result.returncode in ok_returncode except TypeError: result_ok = result.returncode == ok_returncode if not result_ok and raiseonerr: result.log.error('stderr: %s', result.stderr_text) raise subprocess.CalledProcessError( result.returncode, argv, > result.stdout_text, result.stderr_text ) E subprocess.CalledProcessError: Command '['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', '--setup-ca', '--setup-kra', '--ip-address', '192.168.122.204', '--realm', 'IPA.TEST', '--domain', 'ipa.test']' returned non-zero exit status 1.
There is strange message in the log http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/b3dafa02-3f12-11ea-86b4-fa163e74bb0d/test_integration-test_replication_layouts.py-TestCompleteTopologyWithCAKRA-test_complete_topology_with_ca_kra/replica2.ipa.test/journal.gz
Jan 25 02:32:10 replica2.ipa.test ns-slapd[23843]: [25/Jan/2020:02:32:10.965551444 +0000] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=caTomaster.ipa.test" (master:389): Unable to parse the response to the endReplication extended operation.
@tbordaz could you please check this issue?
replica Agreement Replica2->Master (ipaca) is sending updates then regular acquire/release to test if something needs to be sent On Master
[25/Jan/2020:02:32:10.904307763 +0000] conn=444 fd=181 slot=181 connection from 192.168.122.153 to 192.168.122.248 [25/Jan/2020:02:32:10.960168168 +0000] conn=444 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI [25/Jan/2020:02:32:10.976500214 +0000] conn=444 op=2 RESULT err=0 tag=97 nentries=0 etime=0.016479221 dn="krbprincipalname=ldap/replica2.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test" ... [25/Jan/2020:02:32:11.033302037 +0000] conn=444 op=5 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [25/Jan/2020:02:32:11.038152131 +0000] conn=444 op=6 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop" [25/Jan/2020:02:32:11.039289247 +0000] conn=444 op=6 RESULT err=2 tag=120 nentries=0 etime=0.001146522 [25/Jan/2020:02:32:11.040436858 +0000] conn=444 op=5 RESULT err=0 tag=120 nentries=0 etime=0.007345663 [25/Jan/2020:02:33:10.090037743 +0000] conn=444 op=7 UNBIND
No error logged
On Replica1: the following message says that Master's response was invalid to the op=6. But it looks normal as Master was not able to decode it.
[25/Jan/2020:02:32:10.965551444 +0000] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=caTomaster.ipa.test" (master:389): Unable to parse the response to the endReplication extended operation.
The only explanation (PROTOCOL error and no append message) is that replication callback (of release replica) hit an internal error: can not retrieve the connection extension where is replication info are stored.
Something specific is that op=6 was received while op=5 result was not logged. I think it could be the reason of the error as the acquired (op=5) may have not yet stored replication info in the connection.
In conclusion: I think the error comes from the protocol ext release being processed too early while ext acquire have not yet complete its task. It is looking like a DS bug.
@tbordaz I created https://pagure.io/389-ds-base/issue/50880 on DS side. Could you please provide necessary info there, so your team can work on it? Thanks.
There is a bug in DS #50880 that triggers the error message. The symptom is limited to the message in error log as the error is transient and not fatal to replication. In addition the bug should be rare.
Also happened in PR #184 - logs
As of today (2020-06-30): - no test failure in [testing_master_latest] = fedora 32, for the last 12 runs (since June 02) - no test failure in [testing_master_previous] = fedora 31, for the last 8 runs (since June 03) - no test failure in [testing_master_testing] = fedora 32 with updates-testing repo, for the last 4 runs (since May 31) - no test failure in [testing_ipa-4-8_latest] = fedora 32, for the last 5 runs (since May 30) - no test failure in [testing_ipa-4-8_previous] = fedora 31, for the last 3 runs (since May 31)
I think we can close this issue for the time being, and re-open if it re-occurs.
Metadata Update from @frenaud: - Issue close_status updated to: worksforme - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.