#1471 RHDS rarely crashes/shuts down somewhere in a pkidestroy and pkispawn workflow when re-provisioning slaves
Closed: Duplicate None Opened 8 years ago by dminnich.

I've had this happen to me twice total out of the hundreads of installs I've done.

The infrastructure is Master CA with RHDS running on a separate machine. Slave CA with RHDS running on a separate machine. 4 total machines or unique instances. What I recall happening is being unhappy with the slave install for some reason. Then doing a pkidestory, then doing a pkispawn on the slave. The pkisapwn never completes because it is unable to contact the LDAP server to setup the replication agreement. When I login to the RHDS node that the slave is pointing at, sure enough RHDS is no longer running. Once I start it back up and run pkispawn on the slave again things work as they should.

I get the feeling that it may have something to do with the replication agreements but I can't reproduce it reliably enough and haven't spent the time digging in the logs to prove it. I'm not sure if removing replication agreement or trying to create the replication causes the crash.

Its also possible that its something else or I'm doing something weird to cause the problem, but I'm never interacting directly with the RHDS box and nothing else is doing any LDAP operations against it, so it definitely seems like something in RHCS is causing the problem.

Should the problem occur again or if I figure out how to reproduce it reliably, I'll flush out this bug some more. Otherwise, I'm curious if other people might chime in and say they've experienced similar and can provide more info. If there is no input after a while, feel free to close the bug.

This isn't a blocker or big deal for us. Just wanted to put it out there in case others are seeing it.


Per CS/DS meeting of 07/13/2015: 10.3

I just saw this happen again.
The setup was like this. Each entity is a seperate machine.
Master ca01 -> ldap01
Clone ca02 -> ldap02
Serveral kras, ocsps and a 3rd CA also existed, but I don't think any of that is relevant.

A full install of these components had taken place in the past and was working fine.
I decided to do a re-insall with a new version of RHCS. To do that I pkidestory'ed and yum removed everything.

I then yum installed the latest release and pkispawn'ed on ca01 and ca02. The pkispawn on ca02 failed because rhds on ldap02 went down. I had not touched the LDAP server between the uninstall and re-install process. And I noticed that RHDS on ldap02 was in fact running before I issued the pkispawn on ca02. So something in the pkispawn of a re-install of a clone CA seems to kill RHDS.

Note that this is happening with pki-ca-10.2.6-2 and redhat-ds-base-10.0.0-1.el7dsrv.x86_64.

Attached are:
pkispawn config of the clone
pkispawn log of the clone
debug log of the clone
access logs for rhds on ldap02
error logs for rhds on ldap02

both the rhcs debug log and the rhds error log talk about vlv.

It almost looks like RHCS tells RHDS to delete some data so that it can import it again. Problem is RHDS shuts down to delete the data so the RHCS install never finishes.
One thing I will mention that is if RHDS is supposed to go down and bring itself back up, the way that we are calling pkispawn through puppet may not be allowing a long enough wait time for this to happen. I'd try to test this theory by using pkispawn directly, but I can't get this to happen often enough or know the exact steps to do so.

RHDS:
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRenewal-pki-tomcat).
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Index.
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Index.
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRevocation-pki-tomcat).
[23/Jul/2015:15:10:39 +0000] - Deleted Virtual List View Search (caRevocation-pki-tomcat).
[23/Jul/2015:15:10:40 +0000] - ldbm: Bringing intca02.pki.qa.int.phx1.redhat.com offline...

RHCS:
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: initializing with mininum 3 and maximum 15 connections to host intca02.ldap.qa.int.phx1.redhat.com port 636, secure connection, true, authentication type 1
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: increasing minimum connections by 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: new total available connections 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: new number of connections 3
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: In LdapBoundConnFactory::getConn()
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: masterConn is connected: true
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: getConn: conn is connected true
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: getConn: mNumConns now 2
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS: param=preop.internaldb.post_ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): ldif file = /usr/share/pki/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): ldif file copy to /var/lib/pki/pki-tomcat/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: importLDIFS(): LDAP Errors in importing /var/lib/pki/pki-tomcat/ca/conf/vlv.ldif
[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config:netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allExpiredCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInvalidCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allInValidCertsNotBefore-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config:netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

[23/Jul/2015:15:10:50]http-bio-8443-exec-10: LDAPUtil:importLDIF: exception in adding entry cn=allNonRevokedCerts-pki-tomcat, cn=intca02.pki.qa.int.phx1.redhat.com, cn=ldbm database, cn=plugins, cn=config: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

Metadata Update from @dminnich:
- Issue set to the milestone: 10.3.0

7 years ago

Dogtag PKI is moving from Pagure issues to GitHub issues. This means that existing or new
issues will be reported and tracked through Dogtag PKI's GitHub Issue tracker.

This issue has been cloned to GitHub and is available here:
https://github.com/dogtagpki/pki/issues/2030

If you want to receive further updates on the issue, please navigate to the
GitHub issue and click on Subscribe button.

Thank you for understanding, and we apologize for any inconvenience.

Login to comment on this ticket.

Metadata