#8602 Nightly failure in test_acme.py::TestACME::test_certbot_certonly_standalone: An unexpected error occurred:
Closed: fixed 3 years ago by frenaud. Opened 3 years ago by amore.

The nightly test test_acme.py::TestACME::test_certbot_certonly_standalone
in [testing_master_pki] Nightly PR #560

See PR #560 with logs and
report:

def test_certbot_certonly_standalone(self):
    # Get a cert from ACME service using HTTP challenge and Certbot's
    # standalone HTTP server mode
    self.clients[0].run_command(['systemctl', 'stop', 'httpd'])
  self.clients[0].run_command(
        [
            'certbot',
            '--server', self.acme_server,
            'certonly',
            '--domain', self.clients[0].hostname,
            '--standalone',
        ],
    )

The output is the following:

Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for client0.ipa.test
Waiting for verification...
Cleaning up challenges
An unexpected error occurred:
acme.errors.ClientError: <Response [500]>
Please see the logfiles in /var/log/letsencrypt for more details.


Metadata Update from @frenaud:
- Issue tagged with: test-failure, tests

3 years ago

Was this tested against the latest PKI on master branch?
Is the problem happening consistently?
Are you testing with multiple clients?
Can this be reproduced with PKI only, without IPA?

We fixed some concurrency issues recently:

Package version is pki-ca-10.11.0-0.1.alpha1.20201120235153UTC.bce94aea.fc32.noarch

It's a new-ish failure. We merged some pretty hefty changes to the ACME testing on Friday but these certbot and mod_md tests date back to Fraser's initial commit. It doesn't seem to be failing all the time but I can't give any precision.

We only test with certbot and mod_md right now. I think this is the first time I've ever seen the certbot tests fail. mod_md fails periodically to obtain a cert over ACME.

This is in the context of our CI. I haven't done any manual testing.

As to concurrency, I wonder. In these tests we have two CA servers which share the same DNS name, ipa-ca. It's very possible that the registration could go against one and the request against another. Could this be related to replication delay?

More logs from a PR today ( https://github.com/freeipa/freeipa/pull/5294 )

http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/9c65ce9c-3340-11eb-8000-fa163e462157/test_integration-test_acme.py-TestACMEwithExternalCA-test_mod_md/master.ipa.test/var/log/pki/pki-tomcat/acme/debug.2020-11-30.log.gz

That might be the case here. The ACME debug log should show when a nonce is created & destroyed. However, the invalid nonces reported in the log were never created in that server, so they're probably created in the other server and were not replicated in time.

Is it possible to utilize a sticky session so the client will keep using the same server?

The other alternative is to replace the nonce with encrypted counter instead of random IDs stored in the database, but we don't have any plan to implement it in PKI 10.10.

master:

  • d0a1606 ipatests: remove test_acme from gating

ipa-4-9:

  • dd1b596 ipatests: remove test_acme from gating

Metadata Update from @frenaud:
- Issue tagged with: tracker

3 years ago

The issue was fixed with the fix for https://pagure.io/freeipa/issue/8712
master:

  • d2d487b Set the ACME baseURL in order to pin a client to a single IPA server
  • b1e72cb Add versions to the ACME config templates and update on upgrade
  • 3d2d067 Add some logging around initial ACME deployment

ipa-4-9:

  • a16dc59 Set the ACME baseURL in order to pin a client to a single IPA server
  • 31061c6 Add versions to the ACME config templates and update on upgrade
  • 6526ab4 Add some logging around initial ACME deployment

Metadata Update from @frenaud:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata