#8343 ipa-client-install registration sometimes fail
Closed: worksforme 3 years ago by pcech. Opened 3 years ago by schlitzered.

Request for enhancement

As admin , I want instance registrations to always succeed.

Issue

sometimes IdM client registration fail with this error:

NFO - Joining realm failed: TLSMC: MozNSS compatibility interception begins.
INFO - tlsmc_open_nssdb: WARN: could not initialize MozNSS context - error -8015.
INFO - tlsmc_convert: INFO: cannot open the NSS DB, expecting PEM configuration is present.
INFO - tlsmc_intercept_initialization: INFO: successfully intercepted TLS initialization. Continuing with OpenSSL only.
INFO - TLSMC: MozNSS compatibility interception ends.
INFO - Bind failed: Invalid credentials

since the instance that has been used to add the host, and the instance that has been chosen by DNS service discovery where not the same, i suspect that the second instance simply was not aware of the new instance/host. so the OTP was not yet present, and the ipa-client-install script was not able to login

Steps to Reproduce

  1. have more then 1 IdM server (setup is equivalent to the 12 node/4 regions setup)
  2. create host in IdM instance A
  3. try to register host using IdM instance B

Actual behavior

registration failes sometimes

Expected behavior

registration should never fail

Version/Release/Distribution

ipa-server-4.6.6-11.el7.x86_64
ipa-client-4.6.6-11.el7.x86_64
389-ds-base-1.3.10.1-5.el7.x86_64
pki-ca-10.5.17-6.el7.noarch
krb5-server-1.15.1-46.el7.x86_64

Additional info:

this happens to AWS ec2 instances, they boot up pretty fast, so there maybe only seconds between host creation & host registration

is there a way to know how long replication from one IdM instance to another Instances takes?

if yes, we could simply introduce a sleep, in the registration script that is running within the ec2 instance (we have a wrapper around ipa-client-install)

but i think the best solution would be if the ipa-client-install script could do a pre check, so see if it is able to login to the instance chosen by DNS service discovery, and if login is not possible, either wait, or try another instance.


How do you enrol the hosts? Are you creating the host entries on another machine and then use OTP token to enrol the host? Or you using the same user account to mass-enrol hosts in parallel? You mention OTP, but I want to be sure because there are two things that can go wrong here.

In general replication takes as long as it takes. There is no time limit or guarantee. It depends on multiple factors like network performance and cluster activity.

Could you please provide the server logs and LDAP logs of the registration and bind process as well as the ipaclient install log?

PS: The MozNSS messages are unrelated and a red herring.

there is a aws lambda function in place picking up ec2 life cycle events, the aws lambda function will call an API, that creates/deletes DNS entries for the instance as well as adding the host to IdM/Freeipa and requesting an OTP.

the created FQDN & OTP is then stored in the API.

a userdata script will talk to the api, and fetch the FQDN & OTP for the instance, and run ipa-client-install.

the source code is here if you are interested: https://github.com/schlitzered/CatWeazle

unfortunately the client-install logs are already gone, since i manually rerun the registration process to fix the instance. but i can provide the server logs, if you tell me what logs you are interested in, apache error_log, and or dirsrv logs?

Sounds like a race with replication. Is the provisioning already done? That usually adds enough lag between deploying the image, booting, etc before it gets to the point of the enrolment script that replication of the OTP is handled without issue.

ipa-join has a unique return code for bad password (15) but ipa-client-install will just return 1 so distinguishing could be difficult.

Brute force you could add a sleep to the script. Ugly, of course.

Or you could re-try on failure. ipa-client-install will log this:

2020-06-03T20:23:40Z DEBUG args=['/usr/sbin/ipa-join', '-s', 'ipa.example.test', '-b', 'dc=example,dc=test', '-h', 'replica.example.test', '-w', XXXXXXXX]
2020-06-03T20:23:40Z DEBUG Process finished, return code=15

You could grep for 'return code=15' to see if the password wasn't accepted, sleep, then try again.

Unfortunately with no credentials the machine won't be able to query to see if the host exists in IPA, for example.

I noticed a bug in your README. The enrollment script should be doing:

ipa-client-install -w "${OTP}" --mkhomedir --force-join --unattended

You have FQDN.

i guess i will simply go with the "sleep" since it is the easiest to implement.

but maybe it would make sense to introduce an optional retry feature in ipa-client-install/ipa-join?

Metadata Update from @pcech:
- Issue close_status updated to: worksforme
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata