#7623 Replica install: certmonger sometimes fails
Closed: fixed a year ago Opened a year ago by cheimes.

Issue

During parallel replica installation, a certmonger request sometimes fails with CA_REJECTED or CA_UNREACHABLE. The error occur when the master is either busy or some information haven't been replicated yet. Even a stuck request can be recovered, e.g. when permission and group
information have been replicated.

Steps to Reproduce

Install 3 or more replicas simultaneously

Actual behavior

In some cases, a cert request fails

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 556, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 546, in run_step
    method()
  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 836, in __enable_ssl
    post_command=cmd)
  File "/usr/lib/python2.7/site-packages/ipalib/install/certmonger.py", line 317, in request_and_wait_for_cert
    raise RuntimeError("Certificate issuance failed ({})".format(state))
RuntimeError: Certificate issuance failed (CA_UNREACHABLE)

or

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 556, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 546, in run_step
    method()
  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 836, in __enable_ssl
    post_command=cmd)
  File "/usr/lib/python2.7/site-packages/ipalib/install/certmonger.py", line 317, in request_and_wait_for_cert
    raise RuntimeError("Certificate issuance failed ({})".format(state))
RuntimeError: Certificate issuance failed (CA_REJECTED)

Expected behavior

No error from certmonger

Additional info:

In all cases, I was able to get a new cert by resubmitting a certmonger requests.


Metadata Update from @cheimes:
- Custom field on_review adjusted to https://github.com/freeipa/freeipa/pull/2122
- Issue priority set to: critical
- Issue set to the milestone: FreeIPA 4.5.5

a year ago

master:

  • 1fa2a7c Auto-retry failed certmonger requests
  • 2b669c5 Wait for client certificates

ipa-4-6:

  • ab8a739 Auto-retry failed certmonger requests
  • bde0b51 Wait for client certificates

ipa-4-5:

  • ec60901 replicainstall: DS SSL replica install pick right certmonger host
  • 5ef8333 Fix race condition in get_locations_records()
  • a9cc862 Tune DS replication settings
  • 79fe981 Auto-retry failed certmonger requests
  • f3dd0cb Wait for client certificates

I'd also like to increase the timeout of certmonger from 10 seconds to 20/30 seconds and increase verbosity of certmonger logs. Is there a way to log each certmonger operation and stage of an operation?

I'm not sure what timeout you are referring to. When it hits a connection error it quits immediately.

You can enable debugging in the daemon, that's about it.

The helpers won't be logged though for versions < 0.79.6.

I'm talking about ca-error: Server at https://master.ipa.example/ipa/xml failed request, will retry: 4214 (RPC failed at server. Configured time limit exceeded).. There seems to be a 10 second timeout somewhere. I haven't found the right knob to turn yet. Is it the dbus timeout of 10,000?

Install log

2018-07-06T21:11:27Z DEBUG certmonger request is in state dbus.String(u'NEWLY_ADDED_READING_KEYINFO', variant_level=1)
2018-07-06T21:11:32Z DEBUG certmonger request is in state dbus.String(u'SUBMITTING', variant_level=1)
2018-07-06T21:11:38Z DEBUG certmonger request is in state dbus.String(u'SUBMITTING', variant_level=1)
2018-07-06T21:11:43Z DEBUG certmonger request is in state dbus.String(u'SUBMITTING', variant_level=1)
2018-07-06T21:11:48Z DEBUG certmonger request is in state dbus.String(u'CA_UNREACHABLE', variant_level=1)
2018-07-06T21:11:48Z DEBUG Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 556, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 546, in run_step
    method()
  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py", line 836, in __enable_ssl
    post_command=cmd)
  File "/usr/lib/python2.7/site-packages/ipalib/install/certmonger.py", line 317, in request_and_wait_for_cert
    raise RuntimeError("Certificate issuance failed ({})".format(state))
RuntimeError: Certificate issuance failed (CA_UNREACHABLE)

Are you sure it is timing out and not failing outright? The curl default connect timeout in lib/connect.h is is 5 minutes but from what I can tell that includes everything from doing the DNS lookups to actually trying to connect.

Should this be broken out from the original request as a separate ticket since part of this is already committed?

Metadata Update from @frenaud:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1623113

a year ago

@cheimes
Can you open a separate ticket to track your idea from above comment?
Closing this one as the fix has already been pushed upstream.

Metadata Update from @frenaud:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

a year ago

After discussion with @cheimes there was no reproducer for the timeout issue. We can consider this ticket closed and will open a separate one if the timeout is seen again.

Login to comment on this ticket.

Metadata