The nightly test test_acme.py::TestACME::test_certbot_certonly_standalone in [testing_master_pki] Nightly PR #560
See PR #560 with logs and report:
def test_certbot_certonly_standalone(self): # Get a cert from ACME service using HTTP challenge and Certbot's # standalone HTTP server mode self.clients[0].run_command(['systemctl', 'stop', 'httpd']) self.clients[0].run_command( [ 'certbot', '--server', self.acme_server, 'certonly', '--domain', self.clients[0].hostname, '--standalone', ], )
The output is the following:
Plugins selected: Authenticator standalone, Installer None Obtaining a new certificate Performing the following challenges: http-01 challenge for client0.ipa.test Waiting for verification... Cleaning up challenges An unexpected error occurred: acme.errors.ClientError: <Response [500]> Please see the logfiles in /var/log/letsencrypt for more details.
Metadata Update from @frenaud: - Issue tagged with: test-failure, tests
The request failed due to a bad session?
http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/f5643d48-3283-11eb-9f15-fa163ef949a7/test_integration-test_acme.py-TestACME-test_certbot_certonly_standalone/replica0.ipa.test/var/log/pki/pki-tomcat/acme/debug.2020-11-29.log.gz
@ftweedal @edewata what do you think?
Was this tested against the latest PKI on master branch? Is the problem happening consistently? Are you testing with multiple clients? Can this be reproduced with PKI only, without IPA?
We fixed some concurrency issues recently:
Package version is pki-ca-10.11.0-0.1.alpha1.20201120235153UTC.bce94aea.fc32.noarch
It's a new-ish failure. We merged some pretty hefty changes to the ACME testing on Friday but these certbot and mod_md tests date back to Fraser's initial commit. It doesn't seem to be failing all the time but I can't give any precision.
We only test with certbot and mod_md right now. I think this is the first time I've ever seen the certbot tests fail. mod_md fails periodically to obtain a cert over ACME.
This is in the context of our CI. I haven't done any manual testing.
As to concurrency, I wonder. In these tests we have two CA servers which share the same DNS name, ipa-ca. It's very possible that the registration could go against one and the request against another. Could this be related to replication delay?
More logs from a PR today ( https://github.com/freeipa/freeipa/pull/5294 )
http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/9c65ce9c-3340-11eb-8000-fa163e462157/test_integration-test_acme.py-TestACMEwithExternalCA-test_mod_md/master.ipa.test/var/log/pki/pki-tomcat/acme/debug.2020-11-30.log.gz
That might be the case here. The ACME debug log should show when a nonce is created & destroyed. However, the invalid nonces reported in the log were never created in that server, so they're probably created in the other server and were not replicated in time.
Is it possible to utilize a sticky session so the client will keep using the same server?
The other alternative is to replace the nonce with encrypted counter instead of random IDs stored in the database, but we don't have any plan to implement it in PKI 10.10.
master:
ipa-4-9:
Metadata Update from @frenaud: - Issue tagged with: tracker
The issue was fixed with the fix for https://pagure.io/freeipa/issue/8712 master:
Metadata Update from @frenaud: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.