#9096 Nightly test failure in testing_master_pki: certificate not retrieved on replica
Closed: fixed 2 years ago by abbra. Opened 2 years ago by amore.

The nightly testing_master_pki::test_integration/test_replica_promotion.py::TestSubCAkeyReplication::test_subca_master is failing.

Logs in PR 1462
- report
- logs

Test scenario:
- install server with DNS, CA,
- install replica with CA
- On server add ca ipa ca-add test_subca_master --subject 'cn=test_subca_master --desc='subca'
- Test on server and replica ipa ca-show test_subca_master

On replica ca is not retrieved and fails with error ipa: ERROR: The certificate for test_subca_master is not available on this server.


Metadata Update from @frenaud:
- Issue tagged with: test-failure

2 years ago

Metadata Update from @rcritten:
- Issue assigned to rcritten

2 years ago

What I can't yet figure out is that the CA is successfully returned and it also appears to have the failure message. From the log:

INFO ipatests.pytest_ipa.integration.host.Host.replica0.IPAOpenSSHTransport:transport.py:391 RUN ['ipa', 'ca-show', 'test_subca_master'] DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:513 RUN ['ipa', 'ca-show', 'test_subca_master'] DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 ipa: ERROR: The certificate for test_subca_master is not available on this server. DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 Name: test_subca_master DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 Description: subca DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 Authority ID: 9b6d54d4-4aa0-406f-991f-9d1f745e624b DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 Subject DN: CN=test_subca_master DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:557 Issuer DN: CN=Certificate Authority,O=IPA.TEST DEBUG ipatests.pytest_ipa.integration.host.Host.replica0.cmd89:transport.py:217 Exit code: 0

It both says that the CA isn't present and it displays the output. Perhaps a side-effect of the ca-show where the data is in both IPA and pki.

There is a single query for this CA in the 389-ds logs and it returns a single entry as expected.

The CA logs confirm that the certificate is not (yet) present:

2022-01-24 08:02:14 [ajp-nio-0:0:0:0:0:0:0:1-8009-exec-1] INFO: PKIRealm: User ipara authenticated
2022-01-24 08:02:14 [AuthorityMonitor] SEVERE: Object certificate not found. Error Certificate not found: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b: Certificate not found: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
Certificate not found: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b: Certificate not found: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:94)
at com.netscape.ca.CertificateAuthority.initCertSigningUnit(CertificateAuthority.java:1323)
at com.netscape.ca.CertificateAuthority.init(CertificateAuthority.java:413)
at org.dogtagpki.server.ca.CAEngine.readAuthority(CAEngine.java:1442)
at com.netscape.ca.AuthorityMonitor.run(AuthorityMonitor.java:140)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.mozilla.jss.crypto.ObjectNotFoundException: Certificate not found: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
at org.mozilla.jss.CryptoManager.findCertByNicknameNative(Native Method)
at org.mozilla.jss.CryptoManager.findCertByNickname(CryptoManager.java:906)
at com.netscape.ca.CASigningUnit.init(CASigningUnit.java:90)
... 5 more

2022-01-24 08:02:14 [AuthorityMonitor] WARNING: CertificateAuthority: CA signing key and cert not (yet) present in NSS database
2022-01-24 08:02:14 [AuthorityMonitor] INFO: CertificateAuthority: Starting KeyRetriever for authority 9b6d54d4-4aa0-406f-991f-9d1f745e624b

8 seconds later it seems to have been added. So maybe a scheduling thing? Note that the test currently has a 15 second sleep after adding the subCA.

2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: reinitializing signing units in KeyRetrieverRunner
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: Initializing cert signing unit
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: SigningUnit: cert chain:
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: SigningUnit: - CN=test_subca_master
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: SigningUnit: - CN=Certificate Authority,O=IPA.TEST
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: - nickname: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: Initializing CRL signing unit
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: - nickname: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: Initializing OCSP signing unit
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CertificateAuthority: - nickname: caSigningCert cert-pki-ca 9b6d54d4-4aa0-406f-991f-9d1f745e624b
2022-01-24 08:02:23 [KeyRetrieverRunner-9b6d54d4-4aa0-406f-991f-9d1f745e624b] INFO: CAEngine: tracking nsUniqueId: c7a0b695-7ceb11ec-8b1f9315-231f5a72
2022-01-24 08:02:23 [AuthorityMonitor] INFO: CAEngine: Loading authority record cn=9b6d54d4-4aa0-406f-991f-9d1f745e624b,ou=authorities,ou=ca,o=ipaca

@edewata do you know how sub CA's are loaded into pki? Should we be more persistent in waiting for it to appear?

Failure is not reproduced In latest run of testing_master_pki 1477 report

So maybe we can close this and see if it pops up again. It appears to be a rare-ish timing issue.

Metadata Update from @amore:
- Custom field affects_doc adjusted to on
- Custom field knownissue adjusted to on
- Issue close_status updated to: worksforme
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @frenaud:
- Issue status updated to: Open (was: Closed)

2 years ago

@ckelley Who can we talk to on the pki team about this? I don't think its a bug in PKI but an expectation in the test. The issue is that aftering a sub-CA it isn't always immediately available in the CA (keys loaded). Is there a trigger we can use to determine that it has been loaded other than retrying? A hardcoded sleep is used now and it fails occasionally.

@rcritten It looks like when these tests were last refactored the sleep time dropped from 30 seconds to 15 seconds - apparently that was sufficient at the time in 2018 but is intermittently insufficient in 2022. I'm not aware of a trigger that you could listen to so you know when this has completed so from my perspective it looks like a retry loop is the way to go.

@edewata is likely to know more and @ftweedal did a lot of work in this area too so he knows a lot about the design of this functionality.

I'll try running it in a loop with periodic sleeps up to 60 seconds.

PR https://github.com/freeipa/freeipa/pull/6201

master:

  • 9ac8821 ipatests: Give the subCA more time to be loaded by the CA

ipa-4-9:

  • 3a4238b ipatests: Give the subCA more time to be loaded by the CA

Metadata Update from @abbra:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata