As a FreeIPA server admin, I want ipa-server-upgrade to complete successfully.
On my systems ipa-server-upgrade failed when upgrading from 4.9.8 to 4.9.11.
ipa-server-upgrade
Following are lots of supporting logs, leading to the root cause.
Running it manually showed:
IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually. Unexpected error - see /var/log/ipaupgrade.log for details: RemoteRetrieveError: Failed to authenticate to CA REST API The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information
/var/log/ipaupgrade.log showed:
/var/log/ipaupgrade.log
2023-05-22T21:50:18Z INFO Migrating profile 'caECServerCertWithSCT' 2023-05-22T21:50:18Z DEBUG request GET https://ipa01.qa.example.net:8443/ca/rest/account/login 2023-05-22T21:50:18Z DEBUG response status 404 2023-05-22T21:50:18Z DEBUG response body (decoded): b'...HTTP Status 404 \xe2\x80\x93 Not Found...'
Manually curling https://ipa01.qa.example.net:8443/ca/rest/account/login afterwards showed the 404 was still happening.
https://ipa01.qa.example.net:8443/ca/rest/account/login
journalctl --unit pki-tomcatd@pki-tomcat.service showed
journalctl --unit pki-tomcatd@pki-tomcat.service
May 22 19:44:10 ipa01.qa.example.net server[3309172]: SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file May 22 19:44:10 ipa01.qa.example.net server[3309172]: SEVERE: Context [/ca] startup failed due to previous errors May 22 19:44:12 ipa01.qa.example.net server[3309172]: SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file May 22 19:44:12 ipa01.qa.example.net server[3309172]: SEVERE: Context [/acme] startup failed due to previous errors
/var/log/pki/pki-tomcat/ca/debug.2023-05-22.log and /var/log/pki/pki-tomcat/acme/debug.2023-05-22.log showed:
/var/log/pki/pki-tomcat/ca/debug.2023-05-22.log
/var/log/pki/pki-tomcat/acme/debug.2023-05-22.log
2023-05-22 19:44:10 [main] FINE: LdapBoundConnection: Connecting to ipa01.qa.example.net:636 with client cert auth 2023-05-22 19:44:10 [main] SEVERE: Unable to create socket: java.net.ConnectException: Connection refused (Connection refused)
2023-05-22 19:44:11 [main] INFO: Loading LDAP database configuration from /etc/pki/pki-tomcat/ca/CS.cfg 2023-05-22 19:44:12 [main] SEVERE: Unable to create socket: java.net.ConnectException: Connection refused (Connection refused)
But I was able to telnet to that LDAP endpoint no problem using telnet ipa01.qa.example.net 636 Running systemctl restart pki-tomcatd@pki-tomcat.service made the 8443 curl work, but the next ipa-server-upgrade run still failed and pki-tomcat was unhappy again.
telnet ipa01.qa.example.net 636
systemctl restart pki-tomcatd@pki-tomcat.service
pki-tomcat
From /var/log/ipaupgrade.log and /var/log/dirsrv/slapd-QA-EXAMPLE-NET/errors, I could see that slapd was being stopped and started right at the time that pki-tomcat's ca and acme containers were trying to connect to slapd. As soon as they were unable to connect, they started returning 404 forever (until someone ran systemctl restart pki-tomcatd@pki-tomcat.service).
/var/log/dirsrv/slapd-QA-EXAMPLE-NET/errors
slapd
ca
acme
See below for my workaround.
It fails to complete, showing the RemoteRetrieveError error above.
RemoteRetrieveError
It completes successfully.
package freeipa-server is not installed package freeipa-client is not installed ipa-server-4.9.11-5.module+el8.8.0+18146+a1d8660b.x86_64 ipa-client-4.9.11-5.module+el8.8.0+18146+a1d8660b.x86_64 389-ds-base-1.4.3.34-1.module+el8.7.0+18367+58a49cb0.x86_64 package pki-ca is not installed krb5-server-1.18.2-22.el8_7.x86_64
Edit /usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py. At about line 1787, inside upgrade_configuration(), after the kra.start('pki-tomcat'), and before the certmonger_service = services.knownservices.certmonger, add this:
/usr/lib/python3.6/site-packages/ipaserver/install/server/upgrade.py
1787
upgrade_configuration()
kra.start('pki-tomcat')
certmonger_service = services.knownservices.certmonger
logger.info('Sleeping 10 seconds for pki-tomcat ca and acme to finish starting up') time.sleep(10)
Also, put import time at the top. Then, run ipa-server-upgrade. This time it completes successfully.
import time
These systems are AWS EC2 instances of size m5.large, with 2 vCPU cores and 8 GiB of memory. They are running Red Hat Enterprise Linux 8.8 (Ootpa), Linux 4.18.0-240.15.1.el8_3.x86_64.
I can see that in [Migrate CRL publish directory] the script waits for ports 8080 and 8443 to be connectable after starting pki-tomcat before it goes on, but at that point the ca and acme containers haven't finished starting yet. The next step ([Verifying that KDC configuration is using ipa-kdb backend]) briefly stops dirsrv (ns-slapd), and on my systems it happens to line up perfectly with ca and acme trying to connect to slapd. And it seems those two Tomcat containers do not retry after getting failures, and just respond with 404 forever instead.
[Migrate CRL publish directory]
[Verifying that KDC configuration is using ipa-kdb backend]
dirsrv
ns-slapd
I'm not sure what the right solution is (specifically code the pki-tomcat to also wait for https://ipa01.qa.example.net:8443/ca/rest/account/login to be working?), but I would like to see this race condition get addressed.
@pgn674 can you check if there is a drop-in file on your server:
# cat /etc/systemd/system/pki-tomcatd@pki-tomcat.service.d/ipa.conf [Service] Environment=LC_ALL=C.UTF-8 ExecStartPost=/usr/libexec/ipa/ipa-pki-wait-running
If not, create the directory + file manually, launch systemctl --system daemon-reload then ipa-server-upgrade. This file ensures that PKI is ready (CA subsystem available) before the start command returns. On new installations (post RHEL 8.1), this file is created at install time but it looks like an upgrade from 8.0 to 8.1+ does not create it, resulting in PKI not fully initialized when the upgrade is run.
Metadata Update from @frenaud: - Issue assigned to frenaud
Metadata Update from @frenaud: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=2215336
Issue linked to bug 2215336
Metadata Update from @frenaud: - Custom field on_review adjusted to https://github.com/freeipa/freeipa/pull/6889
Metadata Update from @frenaud: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=2215336, https://bugzilla.redhat.com/show_bug.cgi?id=2216114 (was: https://bugzilla.redhat.com/show_bug.cgi?id=2215336)
master:
ipa-4-10:
ipa-4-9:
Metadata Update from @frenaud: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Hi, I know it's been a long time but I'm returning just to answer the question.
I have not run any FreeIPA upgrade on the server since I filed this bug. Currently, /etc/systemd/system/pki-tomcatd@pki-tomcat.service.d/ipa.conf and its folder do not exist on the server. Also, this instance was started from an AMI (Amazon Machine Image) named RHEL-8.0.0_HVM-20190426-x86_64-1-Hourly2-GP2, which I think was the latest RHEL 8 AMI as of April of 2019.
/etc/systemd/system/pki-tomcatd@pki-tomcat.service.d/ipa.conf
RHEL-8.0.0_HVM-20190426-x86_64-1-Hourly2-GP2
So it looks like your RHEL 8.0 to 8.1 in-place upgrade thought hit the root cause right on.
I don't see any reason to change this issue's status from Closed: fixed to anything else. Thanks for the fix!
Closed: fixed
Log in to comment on this ticket.