Learn more about these different git repos.
Other Git URLs
We are running SSSD 1.8.3 on openSUSE 12.2 using KRB5 and LDAP. The LDAP access is also kerberized, i.e., SSSD uses GSSAPI to authenticate to the LDAP server (see sssd.conf attached).
After returning from a lunch break, a user was not able to unlock the KDE screensaver at around 13:02. Entering user (iscan) password and confirming simply did not do anything, i.e. he was prompted again by KDE (multiple trials between 13:06 - 13:08).
root 942 0.0 0.0 78372 1624 ? Ss Nov15 0:13 /usr/sbin/sssd -D -f root 945 0.0 0.0 66876 2192 ? S Nov15 0:06 \_ /usr/lib/sssd/sssd_nss --debug-to-files root 946 0.0 0.0 66700 2036 ? S Nov15 0:05 \_ /usr/lib/sssd/sssd_pam --debug-to-files root 28586 0.0 0.2 149332 8528 ? S 13:03 0:00 \_ /usr/lib/sssd/sssd_be --domain default --debug-to-files
The PID and starting time of sssd_be suggest, that it was restarted just after the user tried to login. Possibly after a crash? All the other sssh binaried started days before on system boot. Looking at
grep sssd /var/log/messages
... Nov 19 13:03:35 st-iscan sssd[be[default]]: Starting up ... Nov 19 13:06:53 st-iscan kcheckpass[28512]: pam_sss(xdm:auth): Request to sssd failed. Timer expired Nov 19 13:06:53 st-iscan kcheckpass[28512]: Authentication failure for iscan (invoked by uid 10052) ... Nov 19 13:07:28 st-iscan kcheckpass[28520]: pam_sss(xdm:auth): Request to sssd failed. Timer expired Nov 19 13:07:28 st-iscan kcheckpass[28520]: Authentication failure for iscan (invoked by uid 10052) ... Nov 19 13:07:49 st-iscan kcheckpass[28521]: pam_sss(xdm:auth): Request to sssd failed. Timer expired Nov 19 13:07:49 st-iscan kcheckpass[28521]: Authentication failure for iscan (invoked by uid 10052) ... Nov 19 13:08:09 st-iscan kcheckpass[28582]: pam_sss(xdm:auth): Request to sssd failed. Timer expired Nov 19 13:08:09 st-iscan kcheckpass[28582]: Authentication failure for iscan (invoked by uid 10052) ... Nov 19 13:08:32 st-iscan kcheckpass[28585]: pam_sss(xdm:auth): Request to sssd failed. Timer expired Nov 19 13:08:32 st-iscan kcheckpass[28585]: Authentication failure for iscan (invoked by uid 10052)
as well as
grep default /var/log/sssd/sssd.log
seems to confirm this.
Login at 13:01:53 fails as sssd_default seems to crash when performing LDAP search. Module sssd_default restarts, but login still not possible, maybe due to broken connection to sssd/ldap/krb module after restart (Timer expired error around 13:06-13:08)?
Restart SSSD service, KDE login then succeeds.
attachment sssd_default.log
attachment sssd.conf
Remark: by module "sssd_default" I actually mean "sssd_be"...
The problem just happened again and I had an strace running on "sssd_be". As I am not sure, if the strace contains sensitive material like passwords or Kerberos tokens, I do not want to post it here entirely, but just a snippet.
The IP 192.168.109.3 is our Kerberos KDC, last logfile entry in "sssd_default.log" is at 15:32:30, saying "Executing sasl bind mech: GSSAPI, user: (null)", then, a little while later, "sssd_be" gets killed by "sssd" as it is not responding.
From the strace, one can see that "sssd_be" gets caught in a loop somehow, polling fd 23 indefinitely:
... Last logfile entry: 30668 write(13, "(Mon Nov 19 15:32:30 2012) [sssd[be[default]]] [sdap_get_tgt_recv] (0x0400): ", 77) = 77 30668 write(13, "Child responded: 0 [FILE:/var/lib/sss/db/ccache_LNT.EI.TUM.DE], expired on [1353421950]\n", 88) = 88 30668 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 30668 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 30668 write(13, "(Mon Nov 19 15:32:30 2012) [sssd[be[default]]] [sasl_bind_send] (0x0100): ", 74) = 74 30668 write(13, "Executing sasl bind mech: GSSAPI, user: (null)\n", 47) = 47 30668 uname({sys="Linux", node="st-iscan", ...}) = 0 ... lots of Kerberos processing stripped ... 30668 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 23 30668 fcntl(23, F_SETFD, FD_CLOEXEC) = 0 30668 ioctl(23, FIONBIO, [1]) = 0 30668 connect(23, {sa_family=AF_INET, sin_port=htons(88), sin_addr=inet_addr("192.168.109.3")}, 16) = 0 30668 sendto(23, "<KRB DATA STRIPPED>"..., 696, 0, NULL, 0) = 696 30668 poll([{fd=23, events=POLLIN}], 1, 1000) = 0 (Timeout) 30668 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 24 30668 fcntl(24, F_SETFD, FD_CLOEXEC) = 0 30668 ioctl(24, FIONBIO, [1]) = 0 30668 setsockopt(24, SOL_SOCKET, SO_LINGER, {onoff=0, linger=0}, 8) = 0 30668 connect(24, {sa_family=AF_INET, sin_port=htons(88), sin_addr=inet_addr("192.168.109.3")}, 16) = -1 EINPROGRESS (Operation now in progress) 30668 poll([{fd=23, events=POLLIN}, {fd=24, events=POLLIN|POLLOUT}], 2, 1000) = 1 ([{fd=24, revents=POLLIN|POLLOUT|POLLERR|POLLHUP}]) 30668 close(24) = 0 30668 close(-1) = -1 EBADF (Bad file descriptor) 30668 poll([{fd=23, events=POLLIN}], 1, 1000) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 74) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 73) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 73) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 73) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 73) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, 73) = 1 ([{fd=23, revents=POLLIN}]) ... counting down from 73 to negative numbers until ... 30668 poll([{fd=23, events=POLLIN}], 1, -107791) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107791) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107791) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107791) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107791) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792) = 1 ([{fd=23, revents=POLLIN}]) 30668 poll([{fd=23, events=POLLIN}], 1, -107792 <ptrace(SYSCALL):No such process> 30668 +++ killed by SIGKILL +++
summary: Login fails after some idle time - default module crashed? => Login fails - sssd_be module polling fd indefinitely and gets killed
So there seems to be two problems here:\ 1) sssd_be gets caught in a loop and is not responding while talking to the KDC,\ 2) sssd_be gets killed and restarts ok, but login then is still not possible from KDE (with Timer expired error). Note that a SSH login (with a different username) is working though.
Thank you for the bug report. We've seen a report like this in the past but we haven't been able to gather enough debugging information. Your report is definitely useful.
Problem 2) is far more concerning - if we can't recover from a crash, then we might leave the user locked out (which is basically what happened to you).
Is the bug reproducable in any way?
Well, no, not exactly so far. I tried to simulate a hang of sssd_be by issuing a "pkill --signal STOP sssd_be". Do so, the pings to sssd_be fail and it gets killed and restarted after about 5+ ping trials.
If I try to login to the box while sssd_be is in STOP state, nothing happens (obviously). But also, I cannot login, once sssd_be restarted. I was not able to reproduce this 100%, I think the key part is to do a login trial while sssd_be is not responding.
This situation comes very close to the original problem, but I do not get the Timer expired error message in /var/log/messages. Hence, it's not 100% the identical to the problem #2 described above.
Steps: 1. pkill --signal STOP sssd_be 2. Try to log in, nothing happens (expected). In sssd_pam.log, you can see an authentication request getting sent. 3. Wait for sssd_be getting killed and restarted. 4. Try to log in again, still not working. In sssd_pam.log, it says that 1) it successfully reconnected to the data provider, but 2) an identical authentication request already exists. 5. I waited 2-3 more minutes and the restarted sssd to make login work again.
Just to mention another problem that is caused most likely by this bug :-)
Once sssd_be stops working correctly, bash-completion for files and directories (even as root user) hangs! That is, pressing the TAB key causes bash to hang.
The reason seems to be, that bash queries NSS (libnss_sss.so ) about uid/gid information, but does not get response.
Bug appeared again this Monday morning (after machine idling over the weekend).
No new insight on problem 1) why sssd_be is crashing exactly, but the issue with the bug only appearing after some idle time suggests that it could be Kerberos related (i.e. maybe TGT running out and crashing while getting it again)?
About problem 2) I can confirm that user is locked out! The actual bug looks mostly identical to the simulated case with "pkill --signal STOP sssd_be" from above, so that might be a good starting point for debugging. I can see that sssd_be restarted after crash, but no login possible.
I did some strace on the kscreenlocker and agetty, which both are unable to login the user. Both send a request to sssd_pam, which logs "Identical request in progress" and then does nothing. Login is working again after a kill/restart sssd_pam!
Link to the right BZ when identified.
milestone: NEEDS_TRIAGE => SSSD 1.9.4
Is there anything further I can do to help? Currently I am unable to further diagnose the reason for the crash (problem 1), but the user lockout (problem 2) seems to be reproducible by the steps in comment 5.
Possibly, problem 1 and 2 could be split into two tickets to investigate them independently...?
Sorry for the delay, we've just released 1.9.3 yesterday. This bug is one of the highest priorities in the 1.9.4 cycle, so we'll be looking into it in the next couple of days.
May be same as #1564
rhbz: => 0
I think I found the reason for problem 1), the indefinite polling shown in the strace.
It is most likely bug in krb5-10.x, causing an indefinite loop during KDC communications: https://github.com/krb5/krb5/commit/2b06a22f7fd8ec01fb27a7335125290b8ceb6f18
I compiled a test package with this commit for openSUSE12.2, started testing and will report back. I am confident that problem 1 will be solved by this.
Problem 2) then still open!
I tried to reproduce by using SIGSTOP, but as soon as sssd_be is killed I can login again. I tried using ssh, will try some other programs, I do not have the KDS screensaver handy though
Ok I reproduced with the gnome screensaver. I know exactly what is going on now, working on a fix.
Copying the links to the BZ over from the dup.
rhbz: 0 => [https://bugzilla.redhat.com/show_bug.cgi?id=858345 858345], [https://bugzilla.redhat.com/show_bug.cgi?id=882414 882414]
Fields changed
owner: somebody => simo patch: 0 => 1
resolution: => fixed status: new => closed
Metadata Update from @endzone: - Issue assigned to simo - Issue set to the milestone: SSSD 1.9.4
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/2697
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.