#1445 CA can not publish to KRA when configured with multiple KRAs
Closed: Fixed None Opened 8 years ago by dminnich.

The CA tries to use the multiple KRAs as a single hostname.

If you remove the second kra from Security Domain in LDAP and from the CS.cfg things start working.

Have to go from
ca.connector.KRA.host=kra01.pki.dev.int.devlab.redhat.com:8443 kra02.pki.dev.int.devlab.redhat.com:8443

to

ca.connector.KRA.host=kra01.pki.dev.int.devlab.redhat.com

note the removal of 8443 as well.

LDAP replication probably handles most things, but the servers should be tried round robin if down. Please also make sure OCSP code works in a similar fashion if at all possible.


Per CS/DS Meeting of 6/29/2015: 10.2.6

This seems to be the issue:
https://fedorahosted.org/pki/ticket/891 Missing fail-over code in HttpConnection

The KRA issue is fixed in #891.

Could you provide the procedure to test OCSP failover?

OCSP failover in our case would be one node behind the load balancer being down. Since there is LDAP replication one node going down wouldn't be a problem from an end users perspective.

My mention of OCSP in this ticket is more about how publishing to it works. See https://access.redhat.com/documentation/en-US/Red_Hat_Certificate_System/8.1/html-single/Deploy_and_Install_Guide/index.html#cloning-ocsps after #14. And "For OCSPs, only the master OCSP receives CRL updates, and then the published CRLs are replicated to the clones." under 10.1.3 vs 10.1.2. Cloning for Other Subsystems. Finally, see the picture at 10.1. About Cloning.

I don't know how it all works on the backend, but the picture and all of those sections make it sound like if KRA01 is down when a key is trying to be escrowed it will instead be escrowed to KRA02 and later replicated to KRA01 when its revived and its no big deal. Conversely, if OCSP01 is down when a key is revoked, it won't be published to OCSP02 and the updates wouldn't go out until OCSP01 is back up and another publishing time occurs.

We talked recently about the value of cloned OCSPs. Not using clones and setting up separate publishing agreements to OCSPs kind of fixes the HA publishing problem to OCSPs. One situation worth thinking about though is replication times. If LDAP replication and consumer initialization occurs more frequently than somebody has croned OCSP publishing set, I'd say the cloning may still be valuable. Otherwise if KRA01 goes down for 30minutes and we fire a criminal and only publish CRLs once a day we'd have a split brain situation for 24hours behind the load balancer where sometimes he could get in and sometimes he couldn't depending on the OCSP backend he talked to.

let me know if you need more info.

Per discussion with cfu and jmagne, OCSP failover is being handled separately.

Metadata Update from @dminnich:
- Issue assigned to edewata
- Issue set to the milestone: 10.2.6

7 years ago

Dogtag PKI is moving from Pagure issues to GitHub issues. This means that existing or new
issues will be reported and tracked through Dogtag PKI's GitHub Issue tracker.

This issue has been cloned to GitHub and is available here:
https://github.com/dogtagpki/pki/issues/2005

If you want to receive further updates on the issue, please navigate to the
GitHub issue and click on Subscribe button.

Thank you for understanding, and we apologize for any inconvenience.

Login to comment on this ticket.

Metadata