#1357 Init script reports complete before sssd is actually working
Closed: Fixed None Opened 6 years ago by tljohnsn.

I installed a rhel 6.2 system with the sshd Privilege-separated user account only in active directory and not in the system /etc/passwd file. Under these conditions sshd which is supposed to run directly after sssd will not start.

Instead the following error is returned:
[ OK ] sssd: [ OK ]
Starting sshd: Privilege separation user sshd does not exist
[FAILED]

The system sssd version is sssd-1.5.1-66.el6_2.3.x86_64


How did you configure SSSD?

Can you please add the sanitized sssd.conf file and turn on debugging for the sssd to debug_level=9 and attach a sanitized log that reproduces this issue?

Also nsswitch.conf would be handy.

I suspect this could have the same root cause as https://fedorahosted.org/sssd/ticket/1303 That is, when the request for the ssh user comes in, the SSSD NSS provider is not up yet.

Does sshd start afterwards manually, or if you added some artificial delay in the init script?

Well, I think this is a slightly different issue. Yes, we're taking more time than we'd like to start up, but the bug here is that 'service sssd start' returns success before the SSSD is actually fully started.

Essentially, the init script will always return success as soon as the monitor starts and its pidfile is created. Maybe we need to defer the creation of the pidfile until the responders have been successfully started. This will most likely slow down startup, since it can take a couple seconds for slow domains to load.

I have attached the requested sssd configuration with debug_level=9, as well as the sssd.log file and nsswitch.conf. Please let me know if you require any of the other sssd log files.

I have worked around the problem by creating another init script that runs between sssd and sshd. That script does nothing but sleep five seconds. With that hack in place, or by starting sshd manually on the console, things will work properly.

Replying to [comment:3 sgallagh]:

Well, I think this is a slightly different issue. Yes, we're taking more time than we'd like to start up, but the bug here is that 'service sssd start' returns success before the SSSD is actually fully started.

Essentially, the init script will always return success as soon as the monitor starts and its pidfile is created. Maybe we need to defer the creation of the pidfile until the responders have been successfully started. This will most likely slow down startup, since it can take a couple seconds for slow domains to load.

This is starting to sound like a good idea, I got another report from a different user that has the same problem with autofs.

Maybe we could only wait with reporting that startup has ended until the responders are up, the responders can return most of the data from cache until the providers are started.

Well, in most cases this is actually no different from waiting for the providers to start up.

In the monitor code, we start the providers first and then wait for them to complete[*] before starting the responders.

[*] Well, if the providers do not start up within five seconds, we force the providers to start up anyway.

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.9.0

Fields changed

owner: somebody => okos

Fields changed

milestone: SSSD 1.9.0 => SSSD 1.9.1

Fields changed

owner: okos => pbrezina
status: new => assigned

Fields changed

patch: 0 => 1

master:

resolution: => fixed
status: assigned => closed

Looks like there is still an issue:

[root@ipa17-devel ~]# /etc/init.d/sssd stop && rm -f /var/lib/sss/db/*.ldb /var/lib/sss/mc/* /var/log/sssd/* && /etc/init.d/sssd start && date && id admin; sleep 3 && date && id admin
Stopping sssd (via systemctl):                             [  OK  ]
Starting sssd (via systemctl):                             [  OK  ]
Mo 15. Okt 17:12:30 CEST 2012
id: admin: no such user
Mo 15. Okt 17:12:33 CEST 2012
uid=1662200000(admin) gid=1662200000(admins) Gruppen=1662200000(admins)

According to the logs the nss responder is not started at the time of the first request:

(Mon Oct 15 17:12:31 2012) [sssd[nss]] [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb

and only the second request is recorded in the log files.

milestone: SSSD 1.9.1 => NEEDS_TRIAGE
resolution: fixed =>
status: closed => reopened

Can you paste sssd.log and configuration please?

It looks like systemd ignores our pid file. In the time when systemd reports OK, the pid file does not exist.

/etc/init.d/sssd stop && rm -f /var/lib/sss/db/*.ldb /var/lib/sss/mc/* /var/log/sssd/* && /etc/init.d/sssd start && date && cat /var/run/sssd.pid ; id user-1; sleep 3 && date && cat /var/run/sssd.pid ; id user-1
Stopping sssd (via systemctl):                             [  OK  ]
Starting sssd (via systemctl):                             [  OK  ]
Tue Oct 16 13:47:44 CEST 2012
cat: /var/run/sssd.pid: No such file or directory
id: user-1: No such user
Tue Oct 16 13:47:47 CEST 2012
31118
uid=10001(user-1) gid=10001 groups=10001

We use "forking" type in unit file. According to man systemd.service, systemd does not monitor existence of the pid file, it just reads it to know what is the main process. Systemd returns OK when the original process exits (after fork).

We have two options here:

  • postpone exiting after fork
  • become another type of service (probably notify)

Can you check out if sysv behaves sanely in this respect?

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.9.3

Fixed (again) in sssd-1-9:

  • 3a3e1a3 create pid file immediately after fork again
  • d80485d exit original process after sssd is initialized
  • 6ec69c2 make monitor_quit() usable outside signal handler
  • 8b130e4 fix indendation, coding style and debug levels in server.c
  • 3d73923 add SSSDBG_IMPORTANT_INFO macro

and master:

  • d19c478 create pid file immediately after fork again
  • 715e09e exit original process after sssd is initialized
  • e02ec73 make monitor_quit() usable outside signal handler
  • 53b475b fix indendation, coding style and debug levels in server.c
  • fa3e287 add SSSDBG_IMPORTANT_INFO macro

design: =>
design_review: => 0
fedora_test_page: =>

Fields changed

resolution: => fixed
status: reopened => closed

Metadata Update from @tljohnsn:
- Issue assigned to pbrezina
- Issue set to the milestone: SSSD 1.9.3

2 years ago

Login to comment on this ticket.

Metadata