Learn more about these different git repos.
Other Git URLs
After upgrade from 1.14 to 1.15 I noticed odd behavior of filter_* settings If local user or group absent this works fine and prevents accounts that exist in LDAP from showing up in the host Example of config:
# grep named /etc/sssd/sssd.conf filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root filter_groups = named,stapdev,stapusr,uuidd # grep sss /etc/nsswitch.conf passwd: files sss shadow: files sss group: files sss
Now without local named account the user does not exist as expected:
# grep named /etc/passwd /etc/group # id named id: named: no such user
However, both user and group named still show up as enumerated "leaking:" from LDAP
# getent passwd named # getent passwd | grep named: named:*:25:25:Service Account named:/var/named:/bin/bash # getent group named # getent group | grep named named:*:25:
Now, trying to add local user with running SSSD causes error and weird behavior. the user appears to be created but yet not found in local files.
# useradd -u 25 named useradd: UID 25 is not unique # getent passwd 25 named:*:25:25:Service Account named:/var/named:/bin/bash # getent passwd named named:*:25:25:Service Account named:/var/named:/bin/bash # grep named /etc/passwd # id named uid=25(named) gid=25(named) groups=25(named) # grep named /etc/passwd /etc/group #
After sssd restart it disappears
# service sssd restart Redirecting to /bin/systemctl restart sssd.service # id named id: named: no such user
After downgrade to sssd 1.14 all works as expected and user/group named never shows up from LDAP
<img alt="sssd.conf" src="/SSSD/sssd/issue/raw/files/134205d6c25db2201b743351f39cbe2c8e75e0b8311f6c4f81904cdd11f8af86-sssd.conf" />
What distribution do you use?
We use CentOS-7 here I built RPM out of sssd-1.15 distribution using settings that match stock RPM spec file to make direct replacement
Does your specfile also enable the files domain? (or is there one more sssd_be process running with domain implicit_files) ?
No, only one sssd_be process
<img alt="sssd.spec" src="/SSSD/sssd/issue/raw/files/5df2ac2369eefa80e7b9967c3d37b7ca5f3bca775cad60e54141887176b09fa1-sssd.spec" />
How did you found out they are leaked from LDAP? Because of different home directory?
I would recommend to increase debug_level (8 or 9) in "[nss]" section of sssd.conf remove sssd cache rm -f /var/lib/sss/db/* restart sssd call getent passwd -s named * check log files whether there is something about filtering users.
rm -f /var/lib/sss/db/*
getent passwd -s named
You didn't add local user with useradd -u 25 named because UID is not uniq. and therefore you cannot find it in files.
useradd -u 25 named
BTW It would be good to know details about version of sssd. sssd-1.15 is not exact. Could you provide git hash?
Hmm, I can reproduce it with git master. I assume it is caused by cache_req refactoring.
Thank you very much for bug report.
Glad you can reproduce it guys! Yes, could tell local vs LDAP entry by GECOS field, home, etc.
I first stumble on this when our admins complained to me that on hosts they installed bind RPM after system reboot bind would not start complaining of missing user "named" So I started digging into this and discovered this bug. Now I understand what was happening then: - bind RPM pre-install script would try to create local user named and that would result in error logged into /var/log/secure however user "appeared" to have been created just like in my example so bind would start right after that and run till system rebooted. But in reality no local file entry was created - after system reboot ( = SSSD restart) since no local user was present and named was (semi) filtered from LDAP bind would not start.
I was pulling my hair for trying to figure this one out and then tried for the heck of it downgrade back to 1.14 and then all was good again. Upgraded to 1.15 - problem I even built using latest sssd-1_15_2 git tag RPM yesterday only to discover the issue was still there
$ git log -1 commit 4c9419d98b89a6161a3dde11f9f80be39d12e72a Author: Jakub Hrozek <jhrozek@redhat.com> Date: Wed Mar 15 17:53:24 2017 +0100 Updating the translations for the 1.15.2 release
Actually I cannot reproduce with git master. I had a typo in sssd.conf filter_user vs filter_users
filter_user
filter_users
Oh! Not good.
Some additional points. In presence of local account both local and LDAP entries show up
[root@m0072487 akomlik]# service sssd start [root@m0072487 akomlik]# getent group named [root@m0072487 akomlik]# id named id: named: no such user [root@m0072487 akomlik]# grep named /etc/passwd /etc/group [root@m0072487 akomlik]# service sssd stop Redirecting to /bin/systemctl stop sssd.service [root@m0072487 akomlik]# yum install -y -q bind [root@m0072487 akomlik]# grep named /etc/passwd /etc/group /etc/passwd:named:x:25:25:Named:/var/named:/sbin/nologin /etc/group:named:x:25: [root@m0072487 akomlik]# rm -fr /var/lib/sss/db/* /var/log/sssd/* [root@m0072487 akomlik]# service sssd start Redirecting to /bin/systemctl start sssd.service [root@m0072487 akomlik]# getent group | grep named: named:x:25: named:*:25: [root@m0072487 akomlik]# getent passwd | grep named: named:x:25:25:Named:/var/named:/sbin/nologin named:*:25:25:Service Account named:/var/named:/bin/bash [root@m0072487 akomlik]#
I assume that the 1st getent group named didn't return anything because enumeration has not finished yet.
getent group named
Either try to wait few seconds or test with disabled enumeration.
And try to follow document https://fedorahosted.org/sssd/wiki/Troubleshooting.
Or provide reliable reproducer :-)
First "getent group named" did not return anything because of the filter_groups statement.
We're forced to use enumeration or else init_groups takes forever due dozen thousands of LDAP queries required to perform init_groups for a user who logs in (look at our sssd.conf sample attached with elaborate LDAP filter controlling host access). This leads to 30s first login delays on a LAN and >1min logins when LDAP server is connected over WAN link and is unacceptable. With enumeration ON all groups are cached locally and logins are instantaneous.
I'm disabling enumeration and testing again:
Starting point: - user/group named filtered out - enumerations turned off
[root@m0072487 akomlik]# egrep 'named|^enumerate' /etc/sssd/sssd.conf filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root filter_groups = named,stapdev,stapusr,uuidd enumerate=FALSE
Clean cache and restart
[root@m0072487 akomlik]# service sssd stop; rm -fr /var/lib/sss/db/ /var/log/sssd/ ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service
Checking for group named returns nothing as expected
[root@m0072487 akomlik]# getent -s sss group named
Checking for GID 25 however returns LDAP entry
[root@m0072487 akomlik]# getent -s sss group 25 named:*:25:
After this check bny name also returns entry
[root@m0072487 akomlik]# getent -s sss group named named:*:25:
Trying to create local group fails now despite the expectations.
[root@m0072487 akomlik]# groupadd -g 25 named groupadd: group 'named' already exists
Lukas, would you like me to conduct online screen-sharing session so we can get to the bottom of this faster?
As I mentioned in original description, once I downgrade to 1.14 things are back to normal: - no group is return either by name named or by GID 25 from LDAP - I can create group and user no problem
[root@m0072487 SPECS]# yum -q -y downgrade -q `rpm -qa --queryformat "%{NAME}-1.14.0-43.el7 " "*sss*" "libipa*"` [root@m0072487 SPECS]# service sssd stop; rm -fr /var/lib/sss/db/* /var/log/sssd/* ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service [root@m0072487 SPECS]# getent group named [root@m0072487 SPECS]# getent group 25 [root@m0072487 SPECS]# groupadd -g 25 named [root@m0072487 SPECS]# useradd -u 25 -g named named [root@m0072487 SPECS]#
Thank you very much. I missed the step with query for GID.
getent -s sss group 25 # sssd 1.14 did not return anything. getent -s sss group named
We can reproduce it now.
Git bisect says that 1st broken commit is 4049b63 "nss: rewrite nss responder so it uses cache_req".
I expected that.
And here is the explanation why it worked in 1.14. We checked negative cache before the request but also after we got results from back-end
(Wed Apr 5 17:46:52 2017) [sssd[nss]] [nss_cmd_getbyid] (0x0400): Running command [18][SSS_NSS_GETPWUID] with id [1002]. (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/UID/1002] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0100): Requesting info for [1002@LDAP] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [get_dp_name_and_id] (0x0400): Not a LOCAL view, continuing with provided values. (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x42960a:1:1002@LDAP] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for [LDAP][0x1][BE_REQ_USER][1][idnumber=1002:-] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sbus_add_timeout] (0x2000): 0x1156de0 (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request [0x42960a:1:1002@LDAP] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sbus_remove_timeout] (0x2000): 0x1156de0 (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sbus_dispatch] (0x4000): dbus conn: 0x1155b10 (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching. (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 0 errno: 0 error message: Success (Wed Apr 5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0100): Requesting info for [1002@LDAP] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0400): Returning info for uid [1002@LDAP] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/USER/LDAP/user2@ldap] (Wed Apr 5 17:46:52 2017) [sssd[nss]] [fill_pwent] (0x0400): User [user2@ldap] filtered out! (negative cache) (Wed Apr 5 17:46:52 2017) [sssd[nss]] [sss_dp_req_destructor] (0x0400): Deleting request: [0x42960a:1:1002@LDAP]
Metadata Update from @jhrozek: - Issue set to the milestone: SSSD 1.15.3
Metadata Update from @jhrozek: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1440132
Issue linked to Bugzilla: Bug 1440132
Metadata Update from @fidencio: - Issue assigned to fidencio
PR opened: https://github.com/SSSD/sssd/pull/246
Metadata Update from @fidencio: - Custom field patch adjusted to on
Metadata Update from @jhrozek: - Issue priority set to: critical
Metadata Update from @jhrozek: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
master:
So, I built sssd RPMs using latest snapshot and still see bizzare issues with user/group "leaking" from LDAP despite using filters
[root@m0072487 ~]# rpm -q sssd-common sssd-common-1.15.3-0.20170516.0926.git6a1da82.el7_3.x86_64 [root@m0072487 ~]# egrep 'named|enumerate' /etc/sssd/sssd.conf filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root filter_groups = named,stapdev,stapusr,uuidd enumerate = True
performing clean start:
[root@m0072487 ~]# service sssd stop ; rm -f /var/lib/sss/db/* /var/log/sssd/* ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service
At first, things apper to be working as expected. No local or LDAP user/group named exists on a system:
[root@m0072487 ~]# getent passwd named [root@m0072487 ~]# getent group named
However, trying now to add local group named I get an error due to existing GID conflict:
[root@m0072487 ~]# groupadd -g 25 named groupadd: GID '25' already exists [root@m0072487 ~]# getent group named named:*:25: [root@m0072487 ~]#
After waiting for few minutes this phantom group disappears and things look normal:
[root@m0072487 ~]# getent group named [root@m0072487 ~]#
However, as soon as I try to create local group same error comes up again:
Next, I disable enumeration and test again. Now I can create local group but then it fails on user creation for same reason. Cache cleanup and restart makes it look good.
[root@m0072487 ~]# egrep 'named|enumerate' /etc/sssd/sssd.conf filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root filter_groups = named,stapdev,stapusr,uuidd enumerate = FALSE [root@m0072487 ~]# service sssd stop ; rm -f /var/lib/sss/db/* /var/log/sssd/* ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service [root@m0072487 ~]# getent group named [root@m0072487 ~]# getent passwd named [root@m0072487 ~]# getent passwd 25 [root@m0072487 ~]# getent group 25 [root@m0072487 ~]# groupadd -g 25 named [root@m0072487 ~]# useradd -u 25 -g named named useradd: UID 25 is not unique [root@m0072487 ~]# getent passwd named named:*:25:25:Service Account named:/var/named:/bin/bash [root@m0072487 ~]# service sssd stop ; rm -f /var/lib/sss/db/* /var/log/sssd/* ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service [root@m0072487 ~]# getent passwd named
I'm reopening the ticket as I'm able to reproduce the issue.
Metadata Update from @fidencio: - Issue status updated to: Open (was: Closed)
@akomlik, would you be willing to try a testing patch from me?
I'm kinda able to reproduce the issue but, unfortunately, not in a reliable way. I do have a blind fix that may help your case ...
@akomlik: Forget about testing. I've just learned that SSSD team do not prefer asking people to test our stuff.
Can you provide a 100% reliable reproducer?
FWIW, I think it's a good idea to ask reporters to test patches, as long as the NVR of the test package is sane so that they can later upgrade to a later version.
@akomlik,
I can reliable reproduce the issue when enumeration is enabled. However, I could reproduce only once when enumeration is not enabled.
May I ask you to also provide sssd_nss.log? (debug_level = 8 is the log level that I need).
Sure, I'll run the test again with log debug level =8 and attach it here.
I also noticed that with enumeration the issue is more pronounced vs without. We have to use enumeration or else resolving complex LDAP filter leads to huge amount of traffic to LDAP server so that simple login could take a minute even on fast network. Enumeration allows prepopulate all users and groups into local SSSD cache so that when admins log in to a host first time it happens quickly.
Super! I've noticed that we haven't been filtering out users when those are found directly in the cache (and that's the code path taken when using enumeration). So, for this one I do have a fix.
The case without enumeration still bugs me and the logs are going to help a lot.
Also, I do believe I've found a simpler reproducer for the enumeration case:
[root@client1 vagrant]# egrep 'user00|enumerate' /etc/sssd/sssd.conf enumerate = True filter_users = user00, user01 filter_groups = user00, user01 [root@client1 vagrant]# rm -rf /var/log/sssd/sssd* /var/lib/sss/db/* ; systemctl restart sssd [root@client1 vagrant]# id 1790400001 uid=1790400001(user00) gid=1790400001(user00) groups=1790400001(user00)
thanks. I just ran my test with enumeration off and it failed in same way:
[root@m0072487 /]# egrep 'named|enumerate' /etc/sssd/sssd.conf filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root filter_groups = named,stapdev,stapusr,uuidd enumerate = FALSE [root@m0072487 /]# service sssd stop ; rm -f /var/lib/sss/db/* /var/log/sssd/* ; service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service root@m0072487 /]# getent passwd 25 [root@m0072487 /]# getent group named [root@m0072487 /]# getent group 25 [root@m0072487 /]# !groupad groupadd -g 25 named groupadd: GID '25' already exists [root@m0072487 /]# id 25 uid=25(named) gid=25(named) groups=25(named) [root@m0072487 /]#
Attaching the nss log <img alt="sssd_nss.log.gz" src="/SSSD/sssd/issue/raw/files/a2f4ce119e3998b8fa2cc9e475c4ceaa9a013257658ef8a8c0e1d7cb025f1563-sssd_nss.log.gz" />
Okay. I can reproduce it here following these steps.
My understanding of the problem is that it may happen in this case (where enumeration is disabled) because the first few getent calls done will end up adding the group to the cache and when we search by the group during the groupadd (or even a simple id 25 would trigger the issue) we don't filter it out when the result is coming from the cache (and here you can see the same issue with the enumeration).
id 25
I'll write a test and submit a PR for this issue.
If you want to give it a try to a patch, feel free: https://github.com/fidencio/sssd/tree/wip/3362
If you decide to test the patch (and I really would appreciate that), keep in mind that it's something not recommended/desired by, at least, part of the SSSD team.
Thanks. I'll give that patch a try first thing tomorrow
It does not scale. Reported will not test the same use-case after each commit in upstream. And without reliable reproducer we cannot write a test and prevent regressions in future.
Anyway, @akomlik thank you very much for bug report and time which you spent with it. I am so sorry for regressions which were introduced in 1.15. It happens with refactoring. We caught many crashes and regressions with existing tests. But it looks like filtering users/groups had very low test coverage. We need to improve it in upstream.
It does not scale. Reporters will not test the same use-case after each commit in upstream. And without reliable reproducer we cannot write a test and prevent regressions in future.
Greetings! I build new RPMs using the patch and still see the problem
[root@m0072487 /]# d 25 uid=25(named) gid=25 groups=25 [root@m0072487 /]# id named uid=25(named) gid=25 groups=25 [root@m0072487 /]#
<img alt="sssd_nss.log.gz" src="/SSSD/sssd/issue/raw/files/52247e8d25bfcc08abea372d9515548247a0e8f05ae810303e3b6d13f92c4cb6-sssd_nss.log.gz" />
Amazing! Thanks a lot for the testing and it shows me there's something else weird happening.
Metadata Update from @lslebodn: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
@akomlik If you want we can prepare test build for you.
yes, please make it.
Here you are https://copr.fedorainfracloud.org/coprs/lslebodn/sssd-master/
great, I just tried it on my server and things are back to normal.
When can we expect next (1.15.3) release with this fix included?
I would expect a week or two.
I still don't see 1.15.3 release
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4392
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.