#3285 SSSD needs restart after incorrect clock is corrected with AD

Created 8 months ago by vojamo
Modified 7 months ago

Steps to reproduce:
1. - SSSD on an AD domain member, CentOS6 with sssd-1.13.3-22.el6_8.4.x86_64
2. - Clock is wrong
3. - Server boots
4. - SSSD starts before NTP
5. - Kerberos fails because clock skew
6. - NTP starts, clock is fixed
7. - Logins continue to not work even after clock is fixed

Expected behavior:
Login should work after clock is corrected.

Attached sssd_domain.log with loglevel 7. Test case:
1. - Stop SSSD
2. - Fudge up clock
3. - Start SSSD
4. - Login does not work (expected since Kerberos on RHEL6 problem with clock skew)
5. - Restart NTP (to get time sync, or sync time other means)
6. - Login still does not work (unexpected)

From messages (these are on incorrect time, +7h):

Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Clock skew too great. Unable to create GSSAPI-encrypted LDAP connection.
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Clock skew too great
Jan 23 13:56:59 s03 ntpd[2337]: ntpd 4.2.6p5@1.2349-o Tue May 31 10:09:21 UTC 2016 (1)

After the above log events, clock is fixed by ntpd.

secure, about 8 minutes later login is attempted:

Jan 23 07:04:42 s03 sshd[5490]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1  user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): received for user aduser: 9 (Authentication service cannot retrieve authentication info)
Jan 23 07:04:45 s03 sshd[5490]: Failed password for aduser from 1 port 50167 ssh2

Fields changed

description: Steps to reproduce:
1- SSSD on an AD domain member, CentOS6 with sssd-1.13.3-22.el6_8.4.x86_64
2- Clock is wrong
3- Server boots
4- SSSD starts before NTP
5- Kerberos fails because clock skew
6- NTP starts, clock is fixed
7- Logins continue to not work even after clock is fixed

Expected behavior:
Login should work after clock is corrected.

Attached sssd_domain.log with loglevel 7. Test case:
1- Stop SSSD
2- Fudge up clock
3- Start SSSD
4- Login does not work (expected since Kerberos on RHEL6 problem with clock skew)
5- Restart NTP (to get time sync, or sync time other means)
6- Login still does not work (unexpected)

From messages (these are on incorrect time, +7h):
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Clock skew too great. Unable to create GSSAPI-encrypted LDAP connection.
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Clock skew too great
Jan 23 13:56:59 s03 ntpd[2337]: ntpd 4.2.6p5 @1.2349-o Tue May 31 10:09:21 UTC 2016 (1)

After the above log events, clock is fixed by ntpd.

secure, about 8 minutes later login is attempted:
Jan 23 07:04:42 s03 sshd[5490]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): received for user aduser: 9 (Authentication service cannot retrieve authentication info)
Jan 23 07:04:45 s03 sshd[5490]: Failed password for aduser from 1 port 50167 ssh2 => Steps to reproduce:
1- SSSD on an AD domain member, CentOS6 with sssd-1.13.3-22.el6_8.4.x86_64
2- Clock is wrong
3- Server boots
4- SSSD starts before NTP
5- Kerberos fails because clock skew
6- NTP starts, clock is fixed
7- Logins continue to not work even after clock is fixed

Expected behavior:
Login should work after clock is corrected.

Attached sssd_domain.log with loglevel 7. Test case:
1- Stop SSSD
2- Fudge up clock
3- Start SSSD
4- Login does not work (expected since Kerberos on RHEL6 problem with clock skew)
5- Restart NTP (to get time sync, or sync time other means)
6- Login still does not work (unexpected)

From messages (these are on incorrect time, +7h):
{{{
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Clock skew too great. Unable to create GSSAPI-encrypted LDAP connection.
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Clock skew too great
Jan 23 13:56:59 s03 ntpd[2337]: ntpd 4.2.6p5 @1.2349-o Tue May 31 10:09:21 UTC 2016 (1)
}}}

After the above log events, clock is fixed by ntpd.

secure, about 8 minutes later login is attempted:
{{{
Jan 23 07:04:42 s03 sshd[5490]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): received for user aduser: 9 (Authentication service cannot retrieve authentication info)
Jan 23 07:04:45 s03 sshd[5490]: Failed password for aduser from 1 port 50167 ssh2
}}}
version: 1.14.2 => 1.13.4

Thanks for the report. I haven't tested this yet, but I suspect this is because when we mark a server as offline, then we only check it after a certain time has passed in the connection fail over code. And because the time drifted backwards in your case, the condition is never true.

By the way, there is a simple workaround - send SIGUSR2 to SSSD after the clock is adjusted

Fields changed

milestone: NEEDS_TRIAGE => SSSD Future releases (no date set yet)

Fields changed

rhbz: => 0

Fields changed

description: Steps to reproduce:
1- SSSD on an AD domain member, CentOS6 with sssd-1.13.3-22.el6_8.4.x86_64
2- Clock is wrong
3- Server boots
4- SSSD starts before NTP
5- Kerberos fails because clock skew
6- NTP starts, clock is fixed
7- Logins continue to not work even after clock is fixed

Expected behavior:
Login should work after clock is corrected.

Attached sssd_domain.log with loglevel 7. Test case:
1- Stop SSSD
2- Fudge up clock
3- Start SSSD
4- Login does not work (expected since Kerberos on RHEL6 problem with clock skew)
5- Restart NTP (to get time sync, or sync time other means)
6- Login still does not work (unexpected)

From messages (these are on incorrect time, +7h):
{{{
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Clock skew too great. Unable to create GSSAPI-encrypted LDAP connection.
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Clock skew too great
Jan 23 13:56:59 s03 ntpd[2337]: ntpd 4.2.6p5 @1.2349-o Tue May 31 10:09:21 UTC 2016 (1)
}}}

After the above log events, clock is fixed by ntpd.

secure, about 8 minutes later login is attempted:
{{{
Jan 23 07:04:42 s03 sshd[5490]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): received for user aduser: 9 (Authentication service cannot retrieve authentication info)
Jan 23 07:04:45 s03 sshd[5490]: Failed password for aduser from 1 port 50167 ssh2
}}} => Steps to reproduce:
1. - SSSD on an AD domain member, CentOS6 with sssd-1.13.3-22.el6_8.4.x86_64
2. - Clock is wrong
3. - Server boots
4. - SSSD starts before NTP
5. - Kerberos fails because clock skew
6. - NTP starts, clock is fixed
7. - Logins continue to not work even after clock is fixed

Expected behavior:
Login should work after clock is corrected.

Attached sssd_domain.log with loglevel 7. Test case:
1. - Stop SSSD
2. - Fudge up clock
3. - Start SSSD
4. - Login does not work (expected since Kerberos on RHEL6 problem with clock skew)
5. - Restart NTP (to get time sync, or sync time other means)
6. - Login still does not work (unexpected)

From messages (these are on incorrect time, +7h):
{{{
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Clock skew too great. Unable to create GSSAPI-encrypted LDAP connection.
Jan 23 13:56:57 s03 [sssd[ldap_child[2275]]]: Clock skew too great
Jan 23 13:56:59 s03 ntpd[2337]: ntpd 4.2.6p5 @1.2349-o Tue May 31 10:09:21 UTC 2016 (1)
}}}

After the above log events, clock is fixed by ntpd.

secure, about 8 minutes later login is attempted:
{{{
Jan 23 07:04:42 s03 sshd[5490]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=1 user=aduser
Jan 23 07:04:43 s03 sshd[5490]: pam_sss(sshd:auth): received for user aduser: 9 (Authentication service cannot retrieve authentication info)
Jan 23 07:04:45 s03 sshd[5490]: Failed password for aduser from 1 port 50167 ssh2
}}}

Replying to [comment:4 jhrozek]:
I am concerned about this triage decision. While I agree that in future we might do something about it inside SSSD for now should we have some integration with NTP via systemd to always signal SSSD to restart when time is adjusted? Time drift is really a usual thing and if SSSD has to be restarted manually this is not a scalable solution.

Perhaps. it's just a matter of priorities. Perhaps we can use a backlog for the next version instead of the future releases, I'm just concerned that with the amount of work already scheduled and the time available, this item might not get done for the next release..

perhaps we can do a more low-tech solution to just re-set the failover status globally when we detect a timeshift with the watchdog? (which we already do..)

milestone: SSSD Future releases (no date set yet) => NEEDS_TRIAGE

I vote "no" for anything involving systemd. Additionally, time drift (in the strict definition of the term) would not cause Kerberos problem due to clock skew. Time drift would not cause clock go backwards or even forward with a large step, it is a slow gradual change and I fail to see how it is relevant. Time drift may be usual but would it really cause this problem with SSSD? This ticket is not about time drift, it is a case where the clock is wildly wrong.

Edit: if you are thinking system-level changes, why not push for ntpd being started before sssd? Would that not be a permanent and elegant solution for the Linux system you care about (RHEL)?

_comment0: I vote "no" for anything involving systemd. Additionally, time drift (in the strict definition of the term) would not cause Kerberos problem due to clock skew. Time drift would not cause clock go backwards or even forward with a large step, it is a slow gradual change and I fail to see how it is relevant. Time drift may be usual but would it really cause this problem with SSSD? This ticket is not about time drift, it is a case where the clock is wildly wrong. => 1486054768486969
milestone: NEEDS_TRIAGE => SSSD Future releases (no date set yet)

The point is that if this is a regular problem that can affect a lot of users because of some time drift issue in a VM that is one thing and needs to be addressed sooner than later. If you say that the issue only shows up when the time is wildly wrong (which is more rare) it lowers the priority. I am not saying it is something we should fix now in the current release but it also does not seem to be a case that should be thrown at the bottom of the backlog. But there might be other ways to deal with it. I am not saying it should be fixed inside SSSD. It might be in some way addressed outside of SSSD when the time is corrected. This can be fixed faster.

7 months ago

Metadata Update from @vojamo:
- Issue set to the milestone: SSSD Future releases (no date set yet)

Login to comment on this ticket.

defect

SSSD

1.13.4

0

0

0

0

0

0

0

cancel