#3080 Race condition on boot between SSSD and Autofs
Closed: Fixed None Opened 2 years ago by vtapia.

Problem description:

When autofs uses direct mappings and has SSSD as a provider, there is a race condition on upstart and systemd where autofs starts before SSSD's responders are active even though autofs is set to start after SSSD. This happens because SSSD is marked as started right after the monitor forks, instead of waiting until the responders are up. When this happens, autofs needs to be restarted manually (HUP) to start offering those mappings.

Test case:

  • Configure an LDAP server containing the direct mappings for autofs (/direct)
  • Configure SSSD to read from that LDAP server
  • Add ‘automount: sss’ to /etc/nsswitch.conf
  • Reboot the machine, login, and try reading the contents of a direct mapping.
  • If the environment has booted correctly, there won’t be errors:

    ubuntu@test-hostl:~$ ls /direct
    ok

  • If autofs started before sssd finished starting an error will happen:

    ubuntu@test-host:~$ ls /direct
    ls: cannot access '/direct': No such file or directory

And the output from syslog will show the following output (I added a logger message in the post-start script to determine when was SSSD marked as started) :

Jul  5 16:23:25 test-host logger: SSSD started
Jul  5 16:23:26 test-host automount[1137]: Starting automounter version 5.0.7, master map /etc/auto.master
Jul  5 16:23:26 test-host automount[1137]: using kernel protocol version 5.02
Jul  5 16:23:26 test-host automount[1137]: setautomntent: lookup(sss): setautomntent: Connection refused
Jul  5 16:23:26 test-host automount[1137]: mounted indirect on /home with timeout 300, freq 75 seconds
Jul  5 16:23:26 test-host sssd: Starting up
Jul  5 16:23:26 test-host sssd[be[openstacklocal]]: Starting up
Jul  5 16:23:26 test-host sssd[autofs]: Starting up
Jul  5 16:23:26 test-host sssd[pam]: Starting up
Jul  5 16:23:26 test-host sssd[nss]: Starting up
Jul  5 16:23:46 test-host automount[1137]: attempting to mount entry /home/ubuntu
Jul  5 16:23:46 test-host automount[1137]: mounted /home/ubuntu

I was able to reproduce this on SSSD 1.13.4 and 1.11.8 (haven't tested other releases)

Possible fix:

Reverting commit https://git.fedorahosted.org/cgit/sssd.git/commit/?id=d19c4785215305e6eb5f2fa2fc503a2ba50d3f10 to create the pidfile after the responders are up, and waiting for it in the upstart/systemd jobs fixes the issue for me. A patch based on that commit is attached for sssd-1.13.4


Sorry for the late reply. I think this should better be solved by #2371. If you don't mind, I would like to close this ticket as a duplicate of #2371.

Replying to [comment:1 jhrozek]:

Sorry for the late reply. I think this should better be solved by #2371. If you don't mind, I would like to close this ticket as a duplicate of #2371.

Hi jhrozek,

there are two different race conditions going on:
1. SSSD starts before AutoFS, but finishes the startup process after AutoFS has started (the case discussed in this bug). The main problem here is that neither upstart nor systemv are able to know when SSSD has fully started (responders available)
2. AutoFS starts after SSSD's responders are up but before the first LDAP query populates the cache (#2371). I've seen this too, and used a workaround (After=network.target in src/sysv/systemd/sssd.service.in).

In my opinion those are two different bugs, even though they might look similar.

I'm sorry, but can you add logs of the first case? I would like to see how it's different from 2.

Hi, sorry for the late answer. Please see the attached logfiles:

-unpatched.txt: This shows a boot where SSSD finishes its boot after autofs ran lookup_sss. In this case, there's no retry from autofs and all direct maps are not mounted, requiring a HUP from autofs.

-patched.txt: in this case, autofs starts after sssd has finished booting and it's able to read the maps from sssd and mount them.

Note: there's a logger message showing when does the upstart job 'finish', so we can compare the expected vs real finish time.

_comment0: Hi, sorry for the late answer. Please see the attached logfiles:
-unpatched.txt: This shows a boot where SSSD finishes its boot after autofs ran lookup_sss. In this case, there's no retry from autofs and all direct maps are not mounted, requiring a HUP from autofs.
-patched.txt: in this case, autofs starts after sssd has finished booting and it's able to read the maps from sssd and mount them.

Note: there's a logger message showing when does the upstart job 'finish', so we can compare the expected vs real finish time. => 1469214344878623

I was finally able to reproduce the bug.

owner: somebody => jhrozek
status: new => assigned

Fields changed

patch: 1 => 0

Patch on list.

patch: 0 => 1

btw can you send the pidfile patch to the list?

Done, sorry for the delay, I misunderstood the 'patch on list' message :)

Sorry, that's my fault, I thought this ticket is in fact caused by the same issue as ticket #3140 but the discussion in launchpad proved otherwise. So I'll keep this ticket separate and we can finish reviewing your patch on sssd-devel.

milestone: NEEDS_TRIAGE => SSSD 1.14.2

Just wanted to note that I installed the patch from sssd-devel and tested it in a situation which usually had problems. It does appear to work fine; according to the logs, sssd returns cached data and all is well.

However, then I found that if I log in quickly enough, I can get into the system without a kerberos ticket. I guess it's using cached credentials in that case. And then I guess any NFS mounts would fail even if autofs starts OK. So pushing some startup back until after the network is up is probably the only general solution.

master:

sssd-1-13:

resolution: => fixed
status: assigned => closed

Replying to [comment:12 lslebodn]:

master:
* b9e155d

sssd-1-13:
* f557e46

Sorry, my patches fix a different issue after all (#3140), reopening.

resolution: fixed =>
status: closed => reopened

Fields changed

owner: jhrozek => vtapia
status: reopened => new

Moving tickets that didn't make it into the 1.14.2 release into the next point release.

milestone: SSSD 1.14.2 => SSSD 1.14.3

Fields changed

cc: => orion@cora.nwra.com

master:

sssd-1-14:

sssd-1-13:

resolution: => fixed
status: new => closed

Metadata Update from @vtapia:
- Issue assigned to vtapia
- Issue set to the milestone: SSSD 1.14.3

2 years ago

Login to comment on this ticket.

Metadata