I have been troubleshooting an error renewing machine account credentials for over 6 months. Seemingly at random times, weeks to months between cases, a seemingly random machine (out of a three-digit number total) fails to renew its machine account credentials and then of course SSSD stops working with AD. I have not yet found out if the problem is at adcli, SSSD, or something else.
I finally managed to capture a log of the error, and it shows a possible bug in SSSD.
SSSD log says the adcli operation "finished successfully":
(Tue Jun 5 12:07:39 2018) [sssd[be[example.com]]] [be_ptask_done] (0x0400): Task [AD machine account password renewal]: finished successfully
Meanwhile, adcli did not actually finish successfully, it has an error:
<adcli output cut out>
! Cannot change computer password: Authentication error
adcli: updating membership with domain example.com failed: Cannot change computer password: Authentication error
So it seems SSSD either thinks the operation was successful even though it had an error, or then the log message of SSSD is misleading.
I have no explanations right now for why the "Authentication error" happens so the troubleshooting of the actual problem will continue.
I agree that the 'finished successfully' message here is a bit irritating. It basically says that SSSD was able to run adcli but does not reflect the result of the adcli operation.
About the 'Authentication error' error. This is most probable a timeout issue while using UDP (which is the default). For the password change libkrb5 send a UDP packet with the needed data to the AD DC. If there is no reply after 1s (hardcoded) it sends the package again because libkrb5 assumes the packet is lost. But if the AD DC received the first packet and is still busy processing it it will reply with KRB5KRB_AP_ERR_REPEAT if the second packet arrives. Since this reply is not specified in the related RFC 3244 libkrb5 assumes the most common issue and returns 'Authentication error'. With this error adcli has to assume the password change failed and does not update the local keytab with the new keys but the AD DC, since it received the first packet, will update the password and won't accept the old one anymore.
This issue is known upstream as http://krbdev.mit.edu/rt/Ticket/Display.html?id=7905.
To get around this I would recommend to effectively disable UDP by setting:
udp_preference_limit = 0
in the [libdefaults] section of /etc/krb5.conf. This is useful in AD environments in general as well because due to the PAC the Kerberos tickets are typically larger than a UDP packet and libkrb5 has to fall back to TCP anyway after trying UDP.
Thank you very much Sumit for giving pointers on the machine password change issue.
I've finally had the time to start testing if the problem is related to UDP like you suggest. Will roll this change out in a few phases and should see in a few months if the problems have stopped.
to comment on this ticket.
Copyright © 2014-2018 Red Hat
4.0.4 — Documentation