#2384 sssd service stops with sssd-ad backend
Closed: Invalid None Opened 9 years ago by dpal.

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 6): Bug 1107323

Please note that this Bug is private and may not be accessible as it contains confidential Red Hat customer information.

Description of problem:
The sssd service stops without a segfault or coredump and when trying to start
it there is the error: "sssd dead but subsys locked", The workaround is to manually remove the lockfile. This results in the users being logged
off. This is currently happening to 3 nodes on an 8 node cluster.


Version-Release number of selected component (if applicable):
1.9.2-129.el6_5.4.x86_64

How reproducible:
unknown

Steps to Reproduce:
1.N/A
2.
3.

Actual results:
sssd stops running

It looks to happen like this in messages
2014-05-28T17:04:18.770537-03:00 brtlvlts0209sl sssd[nss]: Shutting down
2014-05-28T17:04:18.781474-03:00 brtlvlts0209sl sssd[pam]: Shutting down


sssd.log
(Wed May 28 17:04:18 2014) [sssd] [mt_svc_exit_handler] (0x0010): Process
[redecorp], definitely stopped!
(Wed May 28 17:04:18 2014) [sssd] [monitor_quit] (0x0040): Returned with: 1
(Wed May 28 17:04:18 2014) [sssd] [monitor_quit] (0x0020): Terminating
[nss][33655]
(Wed May 28 17:04:18 2014) [sssd] [monitor_quit] (0x0020): Child [nss]
terminated with a signal
(Wed May 28 17:04:18 2014) [sssd] [monitor_quit] (0x0020): Terminating
[pam][9893]
(Wed May 28 17:04:18 2014) [sssd] [monitor_quit] (0x0020): Child [pam] exited
gracefully
(Wed May 28 17:04:18 2014) [sssd] [sbus_remove_watch] (0x2000):
0x1b09560/0x1b17f10

Expected results:
sssd does not stop running

Additional info:
providing sosreport with level 9 debug logs from the incident

Fields changed

blockedby: =>
blocking: =>
changelog: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
owner: somebody => mzidek
review: True => 0
selected: =>
testsupdated: => 0

This bug is only in sssd-1-9 branch and it is reproducible with next example configuration.

[domain/ad.example.test]
cache_credentials = True
id_provider = ad
ad_domain = ad.example.test
krb5_canonicalize = false
auth_provider = ad
chpass_provider = ad
access_provider = ad
ldap_schema = ad

ldap_access_order = expire
ldap_account_expire_policy = ad
ldap_force_upper_case_realm = true
ldap_referrals = false
ldap_uri = ldap://10.34.45.56
ldap_id_use_start_tls = False

krb5_realm = AD.EXAMPLE.TEST
krb5_kpasswd = ad.example.test
krb5_server = ad.example.test

ldap_id_mapping = true
ldap_idmap_default_domain = ad.example.test
ldap_idmap_autorid_compat = true
ldap_idmap_range_min = 100000
ldap_idmap_range_max = 2000100000
ldap_idmap_range_size = 2000000000

Two users should be members of universal group from different ad domain. The sequence of id commands caused crash in my case.

id domaintest3user
id administrator
id domaintest2user

The problem was that the newly added slice was not removed from link list in case of failure. No such space for new slice. A slice was just released. The next lookup of domain SID in link list caused accessing undefined data(in most cases crash)

owner: mzidek => lslebodn

This problem was fixed as a part of re-factoring in ticket #1844 and the commit is in all versions of sssd newer than sssd-1.9.92
- 46222e5

Fields changed

resolution: => worksforme
status: new => closed

Metadata Update from @dpal:
- Issue assigned to lslebodn
- Issue set to the milestone: SSSD 1.11.7

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3426

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata