#2634 sssd nss responder gets wrong number of secondary groups
Closed: Fixed None Opened 3 years ago by jbd.

Hello,

under certain circumstances (cold cache, empty cache) the nss responder answers with the wrong number of secondary groups when queried with a lot of parallel requests.

There has been a discussion about this issue on the mailing list :

https://lists.fedorahosted.org/pipermail/sssd-users/2015-April/002823.html

Someone else manage to reproduce the bug (Chris Petty, using ldap + ad. We are only using ldap with rfc2307 schema on our side) with my python script, so I think it is a good time to open an official bug.

For me, that's a blocker (it means we cannot use sssd on our compute cluster, we'll use a dump of passwd/group/shadow) but feel free to adjust the priority of this bug.

I've got a test case without involving a third party software. It is quite reproductible on my machine. Since it looks like a race, you may need to tweak the parameter of the python script.

The basic idea is to run a bunch of process and wait for a slight amount of time
before calling the initgroups libc function for a specific user

You have to log as root and not use sudo to prevent sssd cache to be populated
before the test is started. You also need to cleanup sssd state before running
the test.

usage:

## log as root
## check the number secondary group for a user using id for example
# id jbdenis

uid=21489(jbdenis) gid=110(sis)
groups=110(sis),3044(CIB),19(floppy),1177(dump-projets),56(netadm),3125(vpn-ssl-admin)

Here I've got 5 secondary groups (sis is my primary group)

## !! VERY IMPORTANT !! cleanup sssd state
# /etc/init.d/sssd stop && rm -f /var/lib/sss/mc/* /var/lib/sss/db/* &&
/etc/init.d/sssd start


## run this program
# python initgroups.py jbdenis 110 5 24 200
wrong number of secondary groups in process 17145 : 0 instead of 5 (sleep 55ms)
wrong number of secondary groups in process 17149 : 0 instead of 5 (sleep 55ms)
2/24 failed

# first parameter is a login
# second parameter is your primary gid (could be anything)
# third parameter is your number of secondary groups
# fourth parameter is the number of process you want to run concurrently
# the last parameter is the maximum delay in milliseconds before calling
initgroups (the delay is randomized up to this maximum)

I've got good results with 24 processes and randomized delay of 200ms between
startup. Those parameters are somewhat relative to the machine you're running
the script on I guess. You may have to run this test multiple time before
triggering the bug.

I'm unable to reproduce the bug when I use 0 delay and I think that why we could
reproduce it with our initial test case.

I've reproduced the bug with 1.12.4, 1.11.6 and 1.9.7.

Here is the output from Chris Petty on the mailing list :

I actually tried it and it was reproducible on my system using sssd 1.11.6 ( ad and ldap config ).

[root@dirac linux]# python initgroups.py cmp12 119549 95 24 200
wrongs number of secondary groups in process 4363 : 5 instead of 95 (sleep 78ms)
wrongs number of secondary groups in process 4366 : 5 instead of 95 (sleep 95ms)
wrongs number of secondary groups in process 4353 : 5 instead of 95 (sleep 90ms)
wrongs number of secondary groups in process 4362 : 5 instead of 95 (sleep 108ms)
wrongs number of secondary groups in process 4358 : 5 instead of 95 (sleep 110ms)
wrongs number of secondary groups in process 4371 : 5 instead of 95 (sleep 121ms)

Lukas was able to reproduce with the help of the reporter.

owner: somebody => lslebodn

Just to keep everything within this ticket :

We've got a "recipie" and configuration files to reproduce the bug from scratch,
on a vanilla CentOS 6 distro (the ldap part is inspired from
http://wiki.openiam.com/pages/viewpage.action?pageId=7635198)

# yum install sssd sssd-common openldap-servers openldap-clients perl-LDAP.noarch
# cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
# chown -R ldap:ldap /var/lib/ldap
# cd /etc/openldap && mv slapd.d slapd.d.original
# cp /root/slapd-minimal.conf /etc/openldap/slapd.conf # use the one provided
with this message
# chown ldap:ldap /etc/openldap/slapd.conf
# chmod 600 /etc/openldap/slapd.conf
# Add this line is /etc/sysconfig/ldap
SLAPD_OPTIONS="-h \"ldap://127.0.0.1 ldaps://127.0.0.1\""
# service slapd start
# chkconfig slapd on

Check that you can connect (the Manager password is "openldap") :

# ldapsearch -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -b
'dc=example,dc=com' 'objectclass=*'

Time to populate our ldap server with our provided file (one user "user1" with
password "openldap" belonging to 29 secondary groups):

# ldapadd -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -f
/root/ldap-init.ldif

You can check that everything went fine with the previous ldapsearch command.

Copy our sssd configuration file:

# cp /root/sssd-minimal.conf /etc/sssd/sssd.conf
# chown root:root /etc/sssd/sssd.conf && chmod 600 /etc/sssd/sssd.conf
# service sssd start
# chkconfig sssd on
# # not sure if the authconfig is strictly necessary here
# authconfig --enablesssd --enablesssdauth --enablelocauthorize
--enablemkhomedir --enablepamaccess --updateall --nostart
# service sssd restart

In /etc/nsswitch.conf, check for :

passwd:     files sss
shadow:     files sss
group:      files sss





# cat /etc/sssd/sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = ldap_local

[nss]
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd
override_shell = /bin/bash

[pam]


[domain/ldap_local]
override_homedir = /home/%u
auth_provider = ldap
ldap_schema = rfc2307
ldap_search_base = ou=people,dc=example,dc=com
ldap_group_search_base = ou=group,dc=example,dc=com
id_provider = ldap
ldap_uri = ldap://localhost/

You can now run your script or mine. Just adapt the initgroups.py call or use
the one provided with this message:

python initgroups.py user1 50001 29 $num_proc $delay)

And run:

# ./run_initgroups.sh
Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]
.wrongs number of secondary groups in process 17626 : 0 instead of 29 (sleep 16ms)
wrongs number of secondary groups in process 17630 : 0 instead of 29 (sleep 26ms)
wrongs number of secondary groups in process 17634 : 0 instead of 29 (sleep 49ms)
wrongs number of secondary groups in process 17615 : 0 instead of 29 (sleep 53ms)
4/24 failed

OR

# ./reproduce.sh
Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]
wrongs number of secondary groups in process 15664 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15672 : 0 instead of 29 (sleep 9ms)
wrongs number of secondary groups in process 15673 : 0 instead of 29 (sleep 10ms)
3/20 failed
Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]
wrongs number of secondary groups in process 15747 : 0 instead of 29 (sleep 3ms)
wrongs number of secondary groups in process 15734 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15735 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15748 : 0 instead of 29 (sleep 3ms)
wrongs number of secondary groups in process 15743 : 0 instead of 29 (sleep 7ms)
wrongs number of secondary groups in process 15745 : 0 instead of 29 (sleep 7ms)
wrongs number of secondary groups in process 15736 : 0 instead of 29 (sleep 5ms)
wrongs number of secondary groups in process 15742 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15731 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15732 : 0 instead of 29 (sleep 14ms)
wrongs number of secondary groups in process 15739 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15749 : 0 instead of 29 (sleep 4ms)

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.12.5

Fields changed

patch: 0 => 1
status: new => assigned

Fixed upstream. master:
- dca7411
- d0cc678
- fd60528
- 390de02
and sssd-1-12:
- cd4e784
- 9ae6567
- 521eb7c
- 21431d9
- c3d7e06
- 17f2f1c
- eb6be4e

resolution: => fixed
status: assigned => closed

Metadata Update from @jbd:
- Issue assigned to lslebodn
- Issue set to the milestone: SSSD 1.12.5

2 years ago

Login to comment on this ticket.

Metadata