#8533 Nightly failure in ipa-replica-install configuring renewals: DBusException: org.freedesktop.DBus.Error.NoReply
Closed: fixed 2 years ago by frenaud. Opened 3 years ago by frenaud.

Issue

The nightly tests failed in test_integration/test_external_ca.py::TestExternalCA::test_external_ca in the replica installation phase, while configuring certificate renewals. See the PR #460 [testing_master_latest] using the master branch + fedora 32.

Logs
report

self = <ipatests.test_integration.test_external_ca.TestExternalCA object at 0x7efcbc77b2b0>

    def test_external_ca(self):
        # Step 1 of ipa-server-install.
        result = install_server_external_ca_step1(
            self.master, extra_args=['--external-ca-type=ms-cs']
        )
        assert result.returncode == 0

        # check CSR for extension
        ipa_csr = self.master.get_file_contents(paths.ROOT_IPA_CSR)
        check_mscs_extension(ipa_csr, ipa_x509.MSCSTemplateV1(u'SubCA'))

        # Sign CA, transport it to the host and get ipa a root ca paths.
        root_ca_fname, ipa_ca_fname = tasks.sign_ca_and_transport(
            self.master, paths.ROOT_IPA_CSR, ROOT_CA, IPA_CA)

        # Step 2 of ipa-server-install.
        result = install_server_external_ca_step2(
            self.master, ipa_ca_fname, root_ca_fname)
        assert result.returncode == 0

        # Make sure IPA server is working properly
        tasks.kinit_admin(self.master)
        result = self.master.run_command(['ipa', 'user-show', 'admin'])
        assert 'User login: admin' in result.stdout_text

        # check that we can also install replica
>       tasks.install_replica(self.master, self.replicas[0])

test_integration/test_external_ca.py:154: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytest_ipa/integration/tasks.py:462: in install_replica
    result = replica.run_command(args, raiseonerr=raiseonerr,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ipatests.pytest_ipa.integration.host.Host replica0.ipa.test (replica)>
argv = ['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', ...]
set_env = True, stdin_text = None, log_stdout = True, raiseonerr = True
cwd = None, bg = False, encoding = 'utf-8', ok_returncode = 0

    def run_command(self, argv, set_env=True, stdin_text=None,
                    log_stdout=True, raiseonerr=True,
                    cwd=None, bg=False, encoding='utf-8', ok_returncode=0):
        """Wrapper around run_command to log stderr on raiseonerr=True

        :param ok_returncode: return code considered to be correct,
                              you can pass an integer or sequence of integers
        """
        result = super().run_command(
            argv, set_env=set_env, stdin_text=stdin_text,
            log_stdout=log_stdout, raiseonerr=False, cwd=cwd, bg=bg,
            encoding=encoding
        )
        # in FIPS mode SSH may print noise to stderr, remove the string
        # "FIPS mode initialized" + optional newline.
        result.stderr_bytes = FIPS_NOISE_RE.sub(b'', result.stderr_bytes)
        try:
            result_ok = result.returncode in ok_returncode
        except TypeError:
            result_ok = result.returncode == ok_returncode
        if not result_ok and raiseonerr:
            result.log.error('stderr: %s', result.stderr_text)
>           raise subprocess.CalledProcessError(
                result.returncode, argv,
                result.stdout_text, result.stderr_text
            )
E           subprocess.CalledProcessError: Command '['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', '--setup-ca', '--ip-address', '192.168.122.26', '--realm', 'IPA.TEST', '--domain', 'ipa.test']' returned non-zero exit status 1.

Replica install logs:

2020-10-07T23:46:21Z DEBUG Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/ipaserver/install/service.py", line 626, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python3.8/site-packages/ipaserver/install/service.py", line 612, in run_step
    method()
  File "/usr/lib/python3.8/site-packages/ipaserver/install/dogtaginstance.py", line 436, in configure_renewal
    certmonger.start_tracking(
  File "/usr/lib/python3.8/site-packages/ipalib/install/certmonger.py", line 574, in start_tracking
    result = cm.obj_if.add_request(params)
  File "/usr/lib64/python3.8/site-packages/dbus/proxies.py", line 141, in __call__
    return self._connection.call_blocking(self._named_service,
  File "/usr/lib64/python3.8/site-packages/dbus/connection.py", line 652, in call_blocking
    reply_message = self.send_message_with_reply_and_block(
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Other issues seen recently related to Dbus:
- 8470 ipa-replica-install now failing where it used to succeed: ipa: ERROR: non-public: DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. happens in replica-conncheck, fedora 32.
- 8506 Nightly failure (rawhide) in ipa-server-install --uninstall: org.freedesktop.DBus.Error.NoReply: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. The ipa-server-install command failed. See /var/log/ipaserver-uninstall.log for more information happens when unconfiguring certmonger CA helper, rahide (currently fedora 34)

Some observations:

I think this may be a red herring but the journal in the same second as the dbus failure:

Oct 07 23:46:21 replica0.ipa.test certmonger[22195]: 2020-10-07 23:46:21 [22195] Error shutting down NSS.

This means NSS thought certmonger held some resource open at shutdown. Not likely the problem but not great either.

certmonger had 25 seconds to start tracking the cert.

2020-10-07T23:45:56Z DEBUG Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state'
2020-10-07T23:46:21Z DEBUG Traceback (most recent call last):
...
reply_message = self.send_message_with_reply_and_block(
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

It failed starting tracking on the first cert, auditSigningCert cert-pki-ca.

It might be helpful to bump up the certmonger debug level to 9 to get more insight into what is going on.

Stanislav Levin ran into this in ALTLinux and F32 and thought it was related to the pending certmonger 0.79.12-1. This test failed with certmonger 0.79.11-2.

certmonger 0.79.11-2 was committed on July 1 so this is looking less like a certmonger issue and more like another problem. The dbus packages are from the same time period.

I addressed the "Error shutting down NSS" errors in https://pagure.io/certmonger/pull-request/171

As a test I created https://github.com/freeipa/freeipa/pull/5180 to play with extending the dbus timeout. It feels very much like a bandaid.

Will next look into the delay caused by the kinit.

I filed https://bugzilla.redhat.com/show_bug.cgi?id=1888009 against nss to look into 13-14 second delays in PK11_ListCertsInSlot().

That would resolve the installation problem. I'm still looking into a similar dbus timeout during uninstall.

The uninstall is failing to remove the IPA-added certmonger CA entries:

2020-10-13T19:20:05Z DEBUG Start of certmonger.service complete
[stack trace]
2020-10-13T19:20:32Z DEBUG The ipa-server-install command failed, exception: DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

And the journal has:

Oct 13 19:20:07 master.ipa.test certmonger[32012]: 2020-10-13 19:20:07 [32012] User ID 0 PID 31810 called /org/fedorahosted/certmonger:org.fedorahosted.certmonger.remove_known_ca.
Oct 13 19:20:20 master.ipa.test certmonger[32021]: Error obtaining initial credentials: Cannot contact any KDC for requested realm.
Oct 13 19:20:20 master.ipa.test certmonger[32021]: Error setting up ccache at the client: Cannot contact any KDC for requested realm.
Oct 13 19:20:33 master.ipa.test certmonger[32012]: 2020-10-13 19:20:33 [32012] Waited for 32044, got 32044.

So 13 seconds trying to get a TGT and another doing...something.

I think it may be related to the service restart immediately beforehand.

Metadata Update from @rcritten:
- Issue assigned to rcritten

3 years ago

master:

  • 139d60d Don't restart certmonger after stopping tracking in uninstall

ipa-4-8:

  • a9aad49 Don't restart certmonger after stopping tracking in uninstall

Metadata Update from @rcritten:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

3 years ago

The issue is still happening in test_integration/test_ipahealthcheck.py::TestIpaHealthCheckWithExternalCA::test_ipahealthcheck_crlmanagercheck::setup, see in the PR #499.
Report, logs:

cls = <class 'ipatests.test_integration.test_ipahealthcheck.TestIpaHealthCheckWithExternalCA'>
mh = <pytest_multihost.plugin.MultihostFixture object at 0x7f2337d3f8b0>

    @classmethod
    def install(cls, mh):
        result = install_server_external_ca_step1(cls.master)
        assert result.returncode == 0
        root_ca_fname, ipa_ca_fname = tasks.sign_ca_and_transport(
            cls.master, paths.ROOT_IPA_CSR, ROOT_CA, IPA_CA
        )

        install_server_external_ca_step2(
            cls.master, ipa_ca_fname, root_ca_fname
        )
        tasks.kinit_admin(cls.master)
        tasks.install_packages(cls.master, HEALTHCHECK_PKG)
        tasks.install_packages(cls.replicas[0], HEALTHCHECK_PKG)
>       tasks.install_replica(cls.master, cls.replicas[0])

test_integration/test_ipahealthcheck.py:1720: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytest_ipa/integration/tasks.py:462: in install_replica
    result = replica.run_command(args, raiseonerr=raiseonerr,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ipatests.pytest_ipa.integration.host.Host replica0.ipa.test (replica)>
argv = ['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', ...]
set_env = True, stdin_text = None, log_stdout = True, raiseonerr = True
cwd = None, bg = False, encoding = 'utf-8', ok_returncode = 0

    def run_command(self, argv, set_env=True, stdin_text=None,
                    log_stdout=True, raiseonerr=True,
                    cwd=None, bg=False, encoding='utf-8', ok_returncode=0):
        """Wrapper around run_command to log stderr on raiseonerr=True

        :param ok_returncode: return code considered to be correct,
                              you can pass an integer or sequence of integers
        """
        result = super().run_command(
            argv, set_env=set_env, stdin_text=stdin_text,
            log_stdout=log_stdout, raiseonerr=False, cwd=cwd, bg=bg,
            encoding=encoding
        )
        # in FIPS mode SSH may print noise to stderr, remove the string
        # "FIPS mode initialized" + optional newline.
        result.stderr_bytes = FIPS_NOISE_RE.sub(b'', result.stderr_bytes)
        try:
            result_ok = result.returncode in ok_returncode
        except TypeError:
            result_ok = result.returncode == ok_returncode
        if not result_ok and raiseonerr:
            result.log.error('stderr: %s', result.stderr_text)
>           raise subprocess.CalledProcessError(
                result.returncode, argv,
                result.stdout_text, result.stderr_text
            )
E           subprocess.CalledProcessError: Command '['ipa-replica-install', '--admin-password', 'Secret.123', '--password', 'Secret.123', '-U', '--setup-ca', '--ip-address', '192.168.122.97', '--realm', 'IPA.TEST', '--domain', 'ipa.test']' returned non-zero exit status 1.

pytest_ipa/integration/host.py:200: CalledProcessError
[...]
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557   [24/28]: configure certificate renewals
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557   [error] DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557 org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557 The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557 Your system may be partly configured.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557 Run /usr/sbin/ipa-server-install --uninstall to clean up.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:557 
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd54:transport.py:217 Exit code: 1

Metadata Update from @frenaud:
- Issue status updated to: Open (was: Closed)

3 years ago

This depends on certmonger-0.79.13 which is still in updates-testing.

Could it be an issue with shutdown order? 389-DS and KDC are stopped before certmonger services are removed. Certmonger wants to acquire a ticket and fails.

journald

Feb 01 13:40:41 master.ipa.test certmonger[30602]: Error obtaining initial credentials: Cannot contact any KDC for requested realm.
Feb 01 13:40:41 master.ipa.test certmonger[30602]: Error setting up ccache at the client: Cannot contact any KDC for requested realm.

dirsrv error log

[01/Feb/2021:13:40:12.025238535 +0000] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
[01/Feb/2021:13:40:12.027851804 +0000] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 2 work q stack objects - freed 8 op stack objects
[01/Feb/2021:13:40:12.028988317 +0000] - INFO - main - slapd stopped.

krb5kdc log

Feb 01 13:40:09 master.ipa.test krb5kdc[21748](info): shutting down
Feb 01 13:40:09 master.ipa.test krb5kdc[21748](info): IPA certauth plugin un-loaded.

uninstall log

2021-02-01T13:40:25Z DEBUG Start of certmonger.service complete
2021-02-01T13:40:50Z DEBUG   File "/usr/lib/python3.9/site-packages/ipapython/admintool.py", line 180, in execute
...
  File "/usr/lib/python3.9/site-packages/ipaserver/install/cainstance.py", line 1007, in uninstall
    iface.remove_known_ca(path)
  File "/usr/lib64/python3.9/site-packages/dbus/proxies.py", line 141, in __call__
    return self._connection.call_blocking(self._named_service,
  File "/usr/lib64/python3.9/site-packages/dbus/connection.py", line 652, in call_blocking
    reply_message = self.send_message_with_reply_and_block(

2021-02-01T13:40:50Z DEBUG The ipa-server-install command failed, exception: DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2021-02-01T13:40:50Z ERROR org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2021-02-01T13:40:50Z ERROR The ipa-server-install command failed. See /var/log/ipaserver-uninstall.log for more information

Logs are from http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/4e47a660-648f-11eb-93cf-fa163e9d1aeb/test_integration-test_sudo.py-TestSudo-uninstall/master.ipa.test/

master:

  • 71047f6 Remove the option stop_certmonger from stop_tracking_*

ipa-4-9:

  • 9872610 Remove the option stop_certmonger from stop_tracking_*

Metadata Update from @abbra:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

3 years ago

The issue is still observed: the message "DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply." occurs during IPA replica installation during "configure certificate renewals":

DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557   [21/30]: Finalize replication settings
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557   [22/30]: configure certmonger for renewals
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557   [23/30]: Importing RA key
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557   [24/30]: configure certificate renewals
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557   [error] DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557 org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557 The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557 Your system may be partly configured.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557 Run /usr/sbin/ipa-server-install --uninstall to clean up.
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:557 
DEBUG    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:transport.py:217 Exit code: 1
ERROR    ipatests.pytest_ipa.integration.host.Host.replica0.cmd77:host.py:201 stderr: Lookup failed: Preferred host replica0.ipa.test does not provide DNS.
Reverse DNS resolution of address 192.168.122.188 (master.ipa.test) failed. Clients may not function properly. Please check your DNS setup. (Note that this check queries IPA DNS directly and ignores /etc/hosts.)
Replica DNS records could not be added on master: Insufficient access: Insufficient 'add' privilege to add the entry 'idnsname=replica0,idnsname=ipa.test.,cn=dns,dc=ipa,dc=test'.
Custodia uses 'master.ipa.test' as master peer.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

Nightly PR: https://github.com/freeipa-pr-ci2/freeipa/pull/1156
Report: http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/252465fa-0f6d-11ec-a54d-fa163e3ef39f/report.html

Metadata Update from @sorlov:
- Issue status updated to: Open (was: Closed)

2 years ago

@sorlov thanks for noticing the issue. We are tracking the new problem at https://pagure.io/freeipa/issue/8870, hence closing this one

Metadata Update from @frenaud:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata