#3529 SSSD-kcm/secrets failed to restart during/after upgrade
Closed: Fixed 6 years ago Opened 6 years ago by benzea.

I did a dnf update today and was stuck without a working kerberos ticket cache. It appears that there were some issues restarting the service rendering it useless thereafter. A simple systemctl restart sssd-kcm.socket fixed the issue obviously.

Attaching both the system log and the DNF output.


Can you reproduce the issue? In the system log, I can only see that KCM had some issues, as you say and the syslog said the socket was already there.

Could you reproduce the issue again, this time adding:

[kcm]
debug_level=10
debug_microseconds=true

[secrets]
debug_level=10
debug_microseconds=true

to sssd.conf and restarting the sssd service?

Hm, I don't seem to be able to reproduce this right now. Though maybe the order/timing of restarts in the post install scripts is relevant?

Maybe... What did you upgrade from and to?

Hm, the attachment is being slightly mishandled, but I think it should be there.

https://pagure.io/SSSD/sssd/issue/raw/files/9a3c92c3286adc4d1040594531a0fe2440afd6178ace5366561a7b6c21a71df4-dnf_history_info_last

Not sure if this might be related; I just unsuspended the machine, and got stuck with a non-working SSSD-KCM. Though it looked a bit like sssd-secrets was stuck on something and sssd-kcm just refused to work (or even stop) at that point.

Attaching an strace of sssd-kcm, sorry, don't have anything else at this point. But it keeps trying to do a 'sendto(14, "GET /kcm/persistent/1000/ccache/"..., 115, MSG_NOSIGNAL, NULL, 0) = 115'. Doing a "klist -A" resulted "klist: Internal credentials cache error while listing ccache collection" (also attaching strace).

I have now enabled the debugging features, so lets hope that something more useful comes out of that.
klist-A-failure-after-suspend-for-more-than-a-day

strace-sssd-kcm-hanging

Seems to be the same bug as in fedora ticket https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c4

Debug log files will be more useful then strace output
https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c12

@benzea Do you use GNOME Online Accounts + kerberos? Or you can reproduce with plain kinit?

GNOME online accounts is obviously running, but I have always added my kerberos identities by running kinit every time.

So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:

  • sssd-*.service is started
  • sssd-*.socket is also triggered (but the socket is not bound yet)
  • daemon comes up
  • daemon binds to the socket as systemd has not done so yet
  • systemd fails to bind the socket and the sssd-*.socket units fail to start up
  • systemd stops sssd-*.service as the socket failed

There are different possible fixes for this:
* add proper Before=/After= lines
* prevent the service from ever trying to bind to the socket if running under systemd

On (03/11/17 10:17), Benjamin Berg wrote:

So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:

  • sssd-*.service is started
  • sssd-*.socket is also triggered (but the socket is not bound yet)
  • daemon comes up
  • daemon binds to the socket as systemd has not done so yet
  • systemd fails to bind the socket and the sssd-*.socket units fail to start up

There are different possible fixes for this:
add proper Before=/After= lines
prevent the service from ever trying to bind to the socket if running under systemd

I checked few other socket activated services

And most of socket activates services use "Requires=$name.socket" instead of
Before/After and some of them used Wants+After

I will check with systemd guys.

LS

Metadata Update from @lslebodn:
- Issue tagged with: PR

6 years ago

Metadata Update from @lslebodn:
- Custom field version adjusted to 1.15.3

6 years ago

Metadata Update from @lslebodn:
- Issue close_status updated to: Fixed
- Issue set to the milestone: SSSD 1.16.1
- Issue status updated to: Closed (was: Open)

6 years ago

Metadata Update from @lslebodn:
- Issue assigned to lslebodn

6 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4555

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata
Attachments 3