Learn more about these different git repos.
Other Git URLs
Problem description:
When autofs uses direct mappings and has SSSD as a provider, there is a race condition on upstart and systemd where autofs starts before SSSD's responders are active even though autofs is set to start after SSSD. This happens because SSSD is marked as started right after the monitor forks, instead of waiting until the responders are up. When this happens, autofs needs to be restarted manually (HUP) to start offering those mappings.
Test case:
If the environment has booted correctly, there won’t be errors:
ubuntu@test-hostl:~$ ls /direct ok
If autofs started before sssd finished starting an error will happen:
ubuntu@test-host:~$ ls /direct ls: cannot access '/direct': No such file or directory
And the output from syslog will show the following output (I added a logger message in the post-start script to determine when was SSSD marked as started) :
Jul 5 16:23:25 test-host logger: SSSD started Jul 5 16:23:26 test-host automount[1137]: Starting automounter version 5.0.7, master map /etc/auto.master Jul 5 16:23:26 test-host automount[1137]: using kernel protocol version 5.02 Jul 5 16:23:26 test-host automount[1137]: setautomntent: lookup(sss): setautomntent: Connection refused Jul 5 16:23:26 test-host automount[1137]: mounted indirect on /home with timeout 300, freq 75 seconds Jul 5 16:23:26 test-host sssd: Starting up Jul 5 16:23:26 test-host sssd[be[openstacklocal]]: Starting up Jul 5 16:23:26 test-host sssd[autofs]: Starting up Jul 5 16:23:26 test-host sssd[pam]: Starting up Jul 5 16:23:26 test-host sssd[nss]: Starting up Jul 5 16:23:46 test-host automount[1137]: attempting to mount entry /home/ubuntu Jul 5 16:23:46 test-host automount[1137]: mounted /home/ubuntu
I was able to reproduce this on SSSD 1.13.4 and 1.11.8 (haven't tested other releases)
Possible fix:
Reverting commit https://git.fedorahosted.org/cgit/sssd.git/commit/?id=d19c4785215305e6eb5f2fa2fc503a2ba50d3f10 to create the pidfile after the responders are up, and waiting for it in the upstart/systemd jobs fixes the issue for me. A patch based on that commit is attached for sssd-1.13.4
attachment pidfile_xenial.diff
Sorry for the late reply. I think this should better be solved by #2371. If you don't mind, I would like to close this ticket as a duplicate of #2371.
Replying to [comment:1 jhrozek]:
Hi jhrozek,
there are two different race conditions going on: 1. SSSD starts before AutoFS, but finishes the startup process after AutoFS has started (the case discussed in this bug). The main problem here is that neither upstart nor systemv are able to know when SSSD has fully started (responders available) 2. AutoFS starts after SSSD's responders are up but before the first LDAP query populates the cache (#2371). I've seen this too, and used a workaround (After=network.target in src/sysv/systemd/sssd.service.in).
In my opinion those are two different bugs, even though they might look similar.
I'm sorry, but can you add logs of the first case? I would like to see how it's different from 2.
Hi, sorry for the late answer. Please see the attached logfiles:
-unpatched.txt: This shows a boot where SSSD finishes its boot after autofs ran lookup_sss. In this case, there's no retry from autofs and all direct maps are not mounted, requiring a HUP from autofs.
-patched.txt: in this case, autofs starts after sssd has finished booting and it's able to read the maps from sssd and mount them.
Note: there's a logger message showing when does the upstart job 'finish', so we can compare the expected vs real finish time.
_comment0: Hi, sorry for the late answer. Please see the attached logfiles: -unpatched.txt: This shows a boot where SSSD finishes its boot after autofs ran lookup_sss. In this case, there's no retry from autofs and all direct maps are not mounted, requiring a HUP from autofs. -patched.txt: in this case, autofs starts after sssd has finished booting and it's able to read the maps from sssd and mount them.
Note: there's a logger message showing when does the upstart job 'finish', so we can compare the expected vs real finish time. => 1469214344878623
attachment patched.txt
attachment unpatched.txt
I was finally able to reproduce the bug.
owner: somebody => jhrozek status: new => assigned
Fields changed
patch: 1 => 0
Patch on list.
patch: 0 => 1
btw can you send the pidfile patch to the list?
Done, sorry for the delay, I misunderstood the 'patch on list' message :)
Sorry, that's my fault, I thought this ticket is in fact caused by the same issue as ticket #3140 but the discussion in launchpad proved otherwise. So I'll keep this ticket separate and we can finish reviewing your patch on sssd-devel.
milestone: NEEDS_TRIAGE => SSSD 1.14.2
Just wanted to note that I installed the patch from sssd-devel and tested it in a situation which usually had problems. It does appear to work fine; according to the logs, sssd returns cached data and all is well.
However, then I found that if I log in quickly enough, I can get into the system without a kerberos ticket. I guess it's using cached credentials in that case. And then I guess any NFS mounts would fail even if autofs starts OK. So pushing some startup back until after the network is up is probably the only general solution.
master:
sssd-1-13:
resolution: => fixed status: assigned => closed
Replying to [comment:12 lslebodn]:
master: * b9e155d sssd-1-13: * f557e46
master: * b9e155d
sssd-1-13: * f557e46
Sorry, my patches fix a different issue after all (#3140), reopening.
resolution: fixed => status: closed => reopened
owner: jhrozek => vtapia status: reopened => new
Moving tickets that didn't make it into the 1.14.2 release into the next point release.
milestone: SSSD 1.14.2 => SSSD 1.14.3
cc: => orion@cora.nwra.com
sssd-1-14:
resolution: => fixed status: new => closed
Metadata Update from @vtapia: - Issue assigned to vtapia - Issue set to the milestone: SSSD 1.14.3
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4113
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Log in to comment on this ticket.