As admin , I want instance registrations to always succeed.
sometimes IdM client registration fail with this error:
NFO - Joining realm failed: TLSMC: MozNSS compatibility interception begins. INFO - tlsmc_open_nssdb: WARN: could not initialize MozNSS context - error -8015. INFO - tlsmc_convert: INFO: cannot open the NSS DB, expecting PEM configuration is present. INFO - tlsmc_intercept_initialization: INFO: successfully intercepted TLS initialization. Continuing with OpenSSL only. INFO - TLSMC: MozNSS compatibility interception ends. INFO - Bind failed: Invalid credentials
since the instance that has been used to add the host, and the instance that has been chosen by DNS service discovery where not the same, i suspect that the second instance simply was not aware of the new instance/host. so the OTP was not yet present, and the ipa-client-install script was not able to login
registration failes sometimes
registration should never fail
ipa-server-4.6.6-11.el7.x86_64 ipa-client-4.6.6-11.el7.x86_64 389-ds-base-1.3.10.1-5.el7.x86_64 pki-ca-10.5.17-6.el7.noarch krb5-server-1.15.1-46.el7.x86_64
this happens to AWS ec2 instances, they boot up pretty fast, so there maybe only seconds between host creation & host registration
is there a way to know how long replication from one IdM instance to another Instances takes?
if yes, we could simply introduce a sleep, in the registration script that is running within the ec2 instance (we have a wrapper around ipa-client-install)
but i think the best solution would be if the ipa-client-install script could do a pre check, so see if it is able to login to the instance chosen by DNS service discovery, and if login is not possible, either wait, or try another instance.
How do you enrol the hosts? Are you creating the host entries on another machine and then use OTP token to enrol the host? Or you using the same user account to mass-enrol hosts in parallel? You mention OTP, but I want to be sure because there are two things that can go wrong here.
In general replication takes as long as it takes. There is no time limit or guarantee. It depends on multiple factors like network performance and cluster activity.
Could you please provide the server logs and LDAP logs of the registration and bind process as well as the ipaclient install log?
PS: The MozNSS messages are unrelated and a red herring.
MozNSS
there is a aws lambda function in place picking up ec2 life cycle events, the aws lambda function will call an API, that creates/deletes DNS entries for the instance as well as adding the host to IdM/Freeipa and requesting an OTP.
the created FQDN & OTP is then stored in the API.
a userdata script will talk to the api, and fetch the FQDN & OTP for the instance, and run ipa-client-install.
the source code is here if you are interested: https://github.com/schlitzered/CatWeazle
unfortunately the client-install logs are already gone, since i manually rerun the registration process to fix the instance. but i can provide the server logs, if you tell me what logs you are interested in, apache error_log, and or dirsrv logs?
Sounds like a race with replication. Is the provisioning already done? That usually adds enough lag between deploying the image, booting, etc before it gets to the point of the enrolment script that replication of the OTP is handled without issue.
ipa-join has a unique return code for bad password (15) but ipa-client-install will just return 1 so distinguishing could be difficult.
Brute force you could add a sleep to the script. Ugly, of course.
Or you could re-try on failure. ipa-client-install will log this:
2020-06-03T20:23:40Z DEBUG args=['/usr/sbin/ipa-join', '-s', 'ipa.example.test', '-b', 'dc=example,dc=test', '-h', 'replica.example.test', '-w', XXXXXXXX] 2020-06-03T20:23:40Z DEBUG Process finished, return code=15
You could grep for 'return code=15' to see if the password wasn't accepted, sleep, then try again.
Unfortunately with no credentials the machine won't be able to query to see if the host exists in IPA, for example.
I noticed a bug in your README. The enrollment script should be doing:
ipa-client-install -w "${OTP}" --mkhomedir --force-join --unattended
You have FQDN.
i guess i will simply go with the "sleep" since it is the easiest to implement.
but maybe it would make sense to introduce an optional retry feature in ipa-client-install/ipa-join?
Metadata Update from @pcech: - Issue close_status updated to: worksforme - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.