#8182 Nightly test failure in test_integration/test_replication_layouts.py::TestCompleteTopologyWithCAKRA::test_complete_topology_with_ca_kra
Opened 5 months ago by frenaud. Modified 19 days ago

The nightly test test_integration/test_replication_layouts.py::TestCompleteTopologyWithCAKRA::test_complete_topology_with_ca_kra occasionally fails, see logs in PR #4159.

The test failed while installing KRA on the 3rd replica.

self = <ipatests.test_integration.test_replication_layouts.TestCompleteTopologyWithCAKRA object at 0x7f181e0657d0>

    def test_complete_topology_with_ca_kra(self):
        tasks.install_topo('complete', self.master, self.replicas, [],
>                          setup_replica_cas=True, setup_replica_kras=True)

test_integration/test_replication_layouts.py:121: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytest_ipa/integration/tasks.py:1283: in install_topo
    setup_kra=setup_replica_kras
pytest_ipa/integration/tasks.py:515: in install_replica
    stdin_text=stdin_text)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ipatests.pytest_ipa.integration.host.Host replica1.ipa.test (replica)>
argv = ['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', ...]
set_env = True, stdin_text = None, log_stdout = True, raiseonerr = True
cwd = None, bg = False, encoding = 'utf-8', ok_returncode = 0

    def run_command(self, argv, set_env=True, stdin_text=None,
                    log_stdout=True, raiseonerr=True,
                    cwd=None, bg=False, encoding='utf-8', ok_returncode=0):
        """Wrapper around run_command to log stderr on raiseonerr=True

        :param ok_returncode: return code considered to be correct,
                              you can pass an integer or sequence of integers
        """
        result = super().run_command(
            argv, set_env=set_env, stdin_text=stdin_text,
            log_stdout=log_stdout, raiseonerr=False, cwd=cwd, bg=bg,
            encoding=encoding
        )
        # in FIPS mode SSH may print noise to stderr, remove the string
        # "FIPS mode initialized" + optional newline.
        result.stderr_bytes = FIPS_NOISE_RE.sub(b'', result.stderr_bytes)
        try:
            result_ok = result.returncode in ok_returncode
        except TypeError:
            result_ok = result.returncode == ok_returncode
        if not result_ok and raiseonerr:
            result.log.error('stderr: %s', result.stderr_text)
            raise subprocess.CalledProcessError(
                result.returncode, argv,
>               result.stdout_text, result.stderr_text
            )
E           subprocess.CalledProcessError: Command '['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', '--setup-ca', '--setup-kra', '--ip-address', '192.168.122.204', '--realm', 'IPA.TEST', '--domain', 'ipa.test']' returned non-zero exit status 1.

There is strange message in the log http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/b3dafa02-3f12-11ea-86b4-fa163e74bb0d/test_integration-test_replication_layouts.py-TestCompleteTopologyWithCAKRA-test_complete_topology_with_ca_kra/replica2.ipa.test/journal.gz

Jan 25 02:32:10 replica2.ipa.test ns-slapd[23843]: [25/Jan/2020:02:32:10.965551444 +0000] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=caTomaster.ipa.test" (master:389): Unable to parse the response  to the endReplication extended operation.

@tbordaz could you please check this issue?

  • replica Agreement Replica2->Master (ipaca) is sending updates then regular acquire/release to test if something needs to be sent
    On Master

    [25/Jan/2020:02:32:10.904307763 +0000] conn=444 fd=181 slot=181 connection from 192.168.122.153 to 192.168.122.248
    [25/Jan/2020:02:32:10.960168168 +0000] conn=444 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
    [25/Jan/2020:02:32:10.976500214 +0000] conn=444 op=2 RESULT err=0 tag=97 nentries=0 etime=0.016479221 dn="krbprincipalname=ldap/replica2.ipa.test@ipa.test,cn=services,cn=accounts,dc=ipa,dc=test"
    ...
    [25/Jan/2020:02:32:11.033302037 +0000] conn=444 op=5 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
    [25/Jan/2020:02:32:11.038152131 +0000] conn=444 op=6 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
    [25/Jan/2020:02:32:11.039289247 +0000] conn=444 op=6 RESULT err=2 tag=120 nentries=0 etime=0.001146522
    [25/Jan/2020:02:32:11.040436858 +0000] conn=444 op=5 RESULT err=0 tag=120 nentries=0 etime=0.007345663
    [25/Jan/2020:02:33:10.090037743 +0000] conn=444 op=7 UNBIND

No error logged

  • Replica2 sent a acquire request (op=5) and a release (op=6) acquire succeeded (err=0) but release failed with PROTOCOL error AND no specific message

On Replica1: the following message says that Master's response was invalid to the op=6. But it looks normal as Master was not able to decode it.

[25/Jan/2020:02:32:10.965551444 +0000] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=caTomaster.ipa.test" (master:389): Unable to parse the response  to the endReplication extended operation.

The only explanation (PROTOCOL error and no append message) is that replication callback (of release replica) hit an internal error: can not retrieve the connection extension where is replication info are stored.

  • Something specific is that op=6 was received while op=5 result was not logged. I think it could be the reason of the error as the acquired (op=5) may have not yet stored replication info in the connection.

    In conclusion:
    I think the error comes from the protocol ext release being processed too early while
    ext acquire have not yet complete its task.
    It is looking like a DS bug.

@tbordaz I created https://pagure.io/389-ds-base/issue/50880 on DS side. Could you please provide necessary info there, so your team can work on it? Thanks.

There is a bug in DS #50880 that triggers the error message. The symptom is limited to the message in error log as the error is transient and not fatal to replication. In addition the bug should be rare.

Login to comment on this ticket.

Metadata