#7091 Certificate renewal scripts can crash and leave FreeIPA in a broken state
Opened 3 years ago by pvoborni. Modified a year ago

Ticket was cloned from Red Hat Bugzilla (product Fedora): Bug 1475541

Rob Crittenden kindly helped me diagnose and fix a serious problem on my
FreeIPA server today. tomcat had suddenly started failing to authenticate when
trying to access the LDAP server (389). Rob figured out (if I understand
correctly) that it was down to a TLS certificate mismatch: the server
certificate Tomcat was expecting 389 to be using was not the same as the server
certificate it was *actually* using. Diagnosing this was not at all
straightforward, and fixing it involved some fairly advanced (for a
non-LDAP-specialist) ldapmodify stuff that again I probably couldn't have
worked out for myself.

We think this was ultimately caused by a problem with the certmonger-based
automatic certificate renewal stuff. It seems that an SELinux denial caused the
renew_ca_cert and stop_pkicad scripts (both part of FreeIPA) not to be able to
stop the pki-tomcatd service, and this failure caused renew_ca_cert to crash
(several times, in fact). I think this resulted in the inconsistent state (the
renewal process got as far as issuing the new cert and configuring 389 to use
it, but didn't manage to configure tomcat to expect it).

Of course, we should fix the SELinux policy so the scripts *aren't* preventing
from stopping the service, and I've filed a bug for that:
https://bugzilla.redhat.com/show_bug.cgi?id=1475528 . But I also thought I
should file this bug to see if the renewal process can be made safer - done
more atomically, so if it fails things are correctly rolled back and FreeIPA
left in a consistent state (and perhaps some kind of understandable alert
generated and logged / sent to the admin / whatever). And certainly it
shouldn't cause renew_ca_cert to just straight up *crash*, as it does.

I'll attaching the full extract of journal messages from the renewal.

Metadata Update from @pvoborni:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1475541

3 years ago

Metadata Update from @pvoborni:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1475541

3 years ago

Metadata Update from @pvoborni:
- Issue priority set to: critical
- Issue set to the milestone: FreeIPA 4.7

3 years ago

Metadata Update from @rcritten:
- Issue set to the milestone: FreeIPA 4.7.1 (was: FreeIPA 4.7)

2 years ago

FreeIPA 4.7 has been released, moving to FreeIPA 4.7.1 milestone

Couple of comments about this one:

  • Any specific bugs should be opened and addresses as separate tickets

  • There is no way to guarantee success, e.g. if LDAP goes down at exactly the time the post-save script starts. I see Health Check, with relevant checks, as addressing the results of such a scenario.

  • It's not clear how this issue, as written, should be addressed (given the SELinux issue was filed separately).

Login to comment on this ticket.