#8748 [Tracker] Random nightly failure in replica install: timeout Waiting for CA subsystem to start
Closed: worksforme 2 years ago by fcami. Opened 3 years ago by frenaud.

The nightly test test_installation.py::TestInstallCA::test_replica_ca_install_with_no_host_dns failed in PR #767 [testing_fedora_latest] on fedora 33, with the following logs and report:

self = <ipatests.test_integration.test_installation.TestInstallCA object at 0x7fa2f6dd2820>

    def test_replica_ca_install_with_no_host_dns(self):
        """
        Test for ipa-ca-install --no-host-dns on a replica
        """

        tasks.install_replica(self.master, self.replicas[0], setup_ca=False)
>       tasks.install_ca(self.replicas[0], extra_args=["--no-host-dns"])

Test scenario:
- install server with embedded CA
- install replica without CA
- install the CA clone on the replica with ipa-ca-install --no-host-dns

The command ipa-ca-install --no-host-dns failed on the replica, in the pkispawn step restarting the CA, see pki-ca-spawn.log:

2021-03-11 00:24:04 INFO: Stopping server
2021-03-11 00:24:04 DEBUG: Command: systemctl stop pki-tomcatd@pki-tomcat.service
2021-03-11 00:24:06 INFO: Removing temp SSL server cert from internal token: Server-Cert cert-pki-ca
2021-03-11 00:24:06 DEBUG: Command: certutil -D -d /etc/pki/pki-tomcat/alias -f /tmp/tmpbl_1jysp/password.txt -n Server-Cert cert-pki-ca
2021-03-11 00:24:06 INFO: Importing permanent SSL server cert into internal token: Server-Cert cert-pki-ca
2021-03-11 00:24:06 DEBUG: Command: certutil -A -d /etc/pki/pki-tomcat/alias -f /tmp/tmpz1vmg_os/internal_password.txt -n Server-Cert cert-pki-ca -a -i /tmp/tmp7qabnqjn/sslserver.crt -t ,,
2021-03-11 00:24:06 INFO: Starting server
2021-03-11 00:24:06 DEBUG: Command: systemctl start pki-tomcatd@pki-tomcat.service
2021-03-11 00:24:09 INFO: FIPS mode: False
2021-03-11 00:24:10 INFO: Waiting for CA subsystem to start (1s)
...
2021-03-11 00:25:09 INFO: Waiting for CA subsystem to start (60s)
2021-03-11 00:25:10 ERROR: Exception: CA subsystem did not start after 60s
  File "/usr/lib/python3.9/site-packages/pki/server/pkispawn.py", line 575, in main
    scriptlet.spawn(deployer)
  File "/usr/lib/python3.9/site-packages/pki/server/deployment/scriptlets/configuration.py", line 1181, in spawn
    deployer.instance.wait_for_startup(
  File "/usr/lib/python3.9/site-packages/pki/server/deployment/pkihelper.py", line 890, in wait_for_startup
    raise Exception('%s subsystem did not start after %ds' %

The journal has a few warnings when stopping pki:

Mar 11 00:24:04 replica1.ipa.test systemd[1]: Stopping PKI Tomcat Server pki-tomcat...
Mar 11 00:24:04 replica1.ipa.test systemd-resolved[17382]: Using degraded feature set TCP instead of UDP for DNS server 192.168.122.245.
Mar 11 00:24:04 replica1.ipa.test systemd-resolved[17382]: Using degraded feature set UDP instead of TCP for DNS server 192.168.122.245.
Mar 11 00:24:04 replica1.ipa.test server[20573]: Java virtual machine used: /usr/bin/java
Mar 11 00:24:04 replica1.ipa.test server[20573]: classpath used: /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar
Mar 11 00:24:04 replica1.ipa.test server[20573]: main class used: org.apache.catalina.startup.Bootstrap
Mar 11 00:24:04 replica1.ipa.test server[20573]: flags used: -Dcom.redhat.fips=false
Mar 11 00:24:04 replica1.ipa.test server[20573]: options used: -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
Mar 11 00:24:04 replica1.ipa.test server[20573]: arguments used: stop
...
Mar 11 00:24:05 replica1.ipa.test server[20401]: WARNING: Please consider reporting this to the maintainers of org.apache.catalina.loader.WebappClassLoaderBase
Mar 11 00:24:05 replica1.ipa.test server[20401]: WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
Mar 11 00:24:05 replica1.ipa.test server[20401]: WARNING: All illegal access operations will be denied in a future release
Mar 11 00:24:05 replica1.ipa.test audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pki-tomcatd@pki-tomcat comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 11 00:24:05 replica1.ipa.test systemd[1]: pki-tomcatd@pki-tomcat.service: Succeeded.
Mar 11 00:24:05 replica1.ipa.test systemd[1]: Stopped PKI Tomcat Server pki-tomcat.
Mar 11 00:24:05 replica1.ipa.test systemd[1]: pki-tomcatd@pki-tomcat.service: Consumed 20.360s CPU time.

and when starting pki:

Mar 11 00:25:15 replica1.ipa.test server[20748]: WARNING: Some of the specified [protocols] are not supported by the SSL engine and have been skipped: [[TLSv1, TLSv1.1]]
Mar 11 00:25:15 replica1.ipa.test server[20748]: SEVERE: Failed to initialize component [Connector[AJP/1.3-8009]]
Mar 11 00:25:15 replica1.ipa.test server[20748]: org.apache.catalina.LifecycleException: Protocol handler initialization failed
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.connector.Connector.initInternal(Connector.java:1049)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.core.StandardService.initInternal(StandardService.java:558)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:1045)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.startup.Catalina.load(Catalina.java:724)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.startup.Catalina.load(Catalina.java:746)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:302)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:472)
Mar 11 00:25:15 replica1.ipa.test server[20748]: Caused by: java.net.BindException: Address already in use
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/sun.nio.ch.Net.bind0(Native Method)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/sun.nio.ch.Net.bind(Net.java:455)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/sun.nio.ch.Net.bind(Net.java:447)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:227)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.tomcat.util.net.NioEndpoint.initServerSocket(NioEndpoint.java:277)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.tomcat.util.net.NioEndpoint.bind(NioEndpoint.java:241)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.tomcat.util.net.AbstractEndpoint.bindWithCleanup(AbstractEndpoint.java:1193)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:1206)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:597)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         at org.apache.catalina.connector.Connector.initInternal(Connector.java:1046)
Mar 11 00:25:15 replica1.ipa.test server[20748]:         ... 13 more

Also happened in PR #770, in test_replica_promotion_TestReplicaInstallCustodia.

Issue observed in [testing_ipa-4.9_latest] Nightly PR #806 in test test_integration/test_installation.py::TestInstallCA::test_replica_ca_install_with_no_host_dns - report

Another occurrence in PR #869 in test test_integration/test_installation.py::TestInstallCA::test_replica_ca_install_with_skip_schema_check - report

Not seen in a while. Closing.

Metadata Update from @fcami:
- Issue close_status updated to: worksforme
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata