#3362 fiter_users and filter_groups stop working properly in v 1.15

Created 21 days ago by akomlik
Modified 16 hours ago

After upgrade from 1.14 to 1.15 I noticed odd behavior of filter_* settings
If local user or group absent this works fine and prevents accounts that exist in LDAP from showing up in the host
Example of config:

# grep named /etc/sssd/sssd.conf 
filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root
filter_groups = named,stapdev,stapusr,uuidd

# grep sss /etc/nsswitch.conf
passwd:     files sss
shadow:     files sss
group:      files sss

Now without local named account the user does not exist as expected:

# grep named /etc/passwd /etc/group 
# id named
id: named: no such user

However, both user and group named still show up as enumerated "leaking:" from LDAP

# getent passwd named
# getent passwd | grep named:
named:*:25:25:Service Account named:/var/named:/bin/bash
# getent group  named
# getent group  | grep named
named:*:25:

Now, trying to add local user with running SSSD causes error and weird behavior. the user appears to be created but yet not found in local files.

# useradd -u 25 named
useradd: UID 25 is not unique
# getent passwd 25
named:*:25:25:Service Account named:/var/named:/bin/bash
# getent passwd named
named:*:25:25:Service Account named:/var/named:/bin/bash
# grep named /etc/passwd
# id named
uid=25(named) gid=25(named) groups=25(named)
# grep named /etc/passwd  /etc/group
#

After sssd restart it disappears

# service sssd restart
Redirecting to /bin/systemctl restart  sssd.service
# id named
id: named: no such user

After downgrade to sssd 1.14 all works as expected and user/group named never shows up from LDAP

sssd.conf

Attachments
sssd.conf - 2017-04-04 17:45:28 From Issue description Download
sssd.spec - 2017-04-04 19:52:28 Comment Download

What distribution do you use?

We use CentOS-7 here
I built RPM out of sssd-1.15 distribution using settings that match stock RPM spec file to make direct replacement

Does your specfile also enable the files domain? (or is there one more sssd_be process running with domain implicit_files) ?

No, only one sssd_be process

sssd.spec

However, both user and group named still show up as enumerated "leaking:" from LDAP

How did you found out they are leaked from LDAP?
Because of different home directory?

I would recommend to
increase debug_level (8 or 9) in "[nss]" section of sssd.conf
remove sssd cache rm -f /var/lib/sss/db/*
restart sssd
call getent passwd -s named
* check log files whether there is something about filtering users.

Now, trying to add local user with running SSSD causes error and weird behavior. the user appears to be created but yet not found in local files.

You didn't add local user with useradd -u 25 named because UID is not uniq. and therefore you cannot find it in files.

BTW It would be good to know details about version of sssd.
sssd-1.15 is not exact. Could you provide git hash?

Hmm, I can reproduce it with git master.
I assume it is caused by cache_req refactoring.

Thank you very much for bug report.

Glad you can reproduce it guys!
Yes, could tell local vs LDAP entry by GECOS field, home, etc.

I first stumble on this when our admins complained to me that on hosts they installed bind RPM after system reboot bind would not start complaining of missing user "named"
So I started digging into this and discovered this bug.
Now I understand what was happening then:
- bind RPM pre-install script would try to create local user named and that would result in error logged into /var/log/secure however user "appeared" to have been created just like in my example so bind would start right after that and run till system rebooted. But in reality no local file entry was created
- after system reboot ( = SSSD restart) since no local user was present and named was (semi) filtered from LDAP bind would not start.

I was pulling my hair for trying to figure this one out and then tried for the heck of it downgrade back to 1.14 and then all was good again. Upgraded to 1.15 - problem
I even built using latest sssd-1_15_2 git tag RPM yesterday only to discover the issue was still there

$ git log -1
commit 4c9419d98b89a6161a3dde11f9f80be39d12e72a
Author: Jakub Hrozek <jhrozek@redhat.com>
Date:   Wed Mar 15 17:53:24 2017 +0100

 Updating the translations for the 1.15.2 release
Edited 20 days ago by akomlik

Actually I cannot reproduce with git master.
I had a typo in sssd.conf filter_user vs filter_users

Oh! Not good.

Some additional points. In presence of local account both local and LDAP entries show up

[root@m0072487 akomlik]# service sssd start
[root@m0072487 akomlik]# getent group named
[root@m0072487 akomlik]# id named
id: named: no such user
[root@m0072487 akomlik]# grep named /etc/passwd /etc/group

[root@m0072487 akomlik]# service sssd stop
Redirecting to /bin/systemctl stop  sssd.service

[root@m0072487 akomlik]# yum install -y -q bind
[root@m0072487 akomlik]# grep named /etc/passwd /etc/group
/etc/passwd:named:x:25:25:Named:/var/named:/sbin/nologin
/etc/group:named:x:25:

[root@m0072487 akomlik]# rm -fr /var/lib/sss/db/* /var/log/sssd/*
[root@m0072487 akomlik]# service sssd start
Redirecting to /bin/systemctl start  sssd.service

[root@m0072487 akomlik]# getent group | grep named:
named:x:25:
named:*:25:
[root@m0072487 akomlik]# getent passwd | grep named:
named:x:25:25:Named:/var/named:/sbin/nologin
named:*:25:25:Service Account named:/var/named:/bin/bash
[root@m0072487 akomlik]#

I assume that the 1st getent group named didn't return anything because enumeration has not finished yet.

Either try to wait few seconds or test with disabled enumeration.

And try to follow document
https://fedorahosted.org/sssd/wiki/Troubleshooting.

Or provide reliable reproducer :-)

First "getent group named" did not return anything because of the filter_groups statement.

We're forced to use enumeration or else init_groups takes forever due dozen thousands of LDAP queries required to perform init_groups for a user who logs in (look at our sssd.conf sample attached with elaborate LDAP filter controlling host access). This leads to 30s first login delays on a LAN and >1min logins when LDAP server is connected over WAN link and is unacceptable.
With enumeration ON all groups are cached locally and logins are instantaneous.

I'm disabling enumeration and testing again:

  1. Starting point:
    - user/group named filtered out
    - enumerations turned off

    [root @m0072487 akomlik]# egrep 'named|^enumerate' /etc/sssd/sssd.conf
    filter_users = avahi,dbus,haldaemon,ldap,named,news,nscd,radiusd,root
    filter_groups = named,stapdev,stapusr,uuidd
    enumerate=FALSE

  2. Clean cache and restart

    [root @m0072487 akomlik]# service sssd stop; rm -fr /var/lib/sss/db/ /var/log/sssd/ ; service sssd start
    Redirecting to /bin/systemctl stop sssd.service
    Redirecting to /bin/systemctl start sssd.service

  3. Checking for group named returns nothing as expected

    [root @m0072487 akomlik]# getent -s sss group named

  4. Checking for GID 25 however returns LDAP entry

    [root @m0072487 akomlik]# getent -s sss group 25
    named:*:25:

  5. After this check bny name also returns entry

    [root @m0072487 akomlik]# getent -s sss group named
    named:*:25:

  6. Trying to create local group fails now despite the expectations.

    [root @m0072487 akomlik]# groupadd -g 25 named
    groupadd: group 'named' already exists

Lukas, would you like me to conduct online screen-sharing session so we can get to the bottom of this faster?

As I mentioned in original description, once I downgrade to 1.14 things are back to normal:
- no group is return either by name named or by GID 25 from LDAP
- I can create group and user no problem

[root@m0072487 SPECS]# yum -q -y downgrade -q `rpm -qa --queryformat "%{NAME}-1.14.0-43.el7 " "*sss*" "libipa*"`
[root@m0072487 SPECS]#  service sssd stop; rm -fr /var/lib/sss/db/* /var/log/sssd/* ; service sssd start
Redirecting to /bin/systemctl stop  sssd.service
Redirecting to /bin/systemctl start  sssd.service
[root@m0072487 SPECS]# getent group named
[root@m0072487 SPECS]# getent group 25
[root@m0072487 SPECS]# groupadd -g 25 named
[root@m0072487 SPECS]# useradd -u 25 -g named named
[root@m0072487 SPECS]#

Thank you very much. I missed the step with query for GID.

getent -s sss group 25 # sssd 1.14 did not return anything.
getent -s sss group named

We can reproduce it now.

Git bisect says that 1st broken commit is 4049b63f8c67ada17b453463b0451ca6be3d5de4 "nss: rewrite nss responder so it uses cache_req".

I expected that.

And here is the explanation why it worked in 1.14. We checked negative cache before the request but also after we got results from back-end

(Wed Apr  5 17:46:52 2017) [sssd[nss]] [nss_cmd_getbyid] (0x0400): Running command [18][SSS_NSS_GETPWUID] with id [1002].
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/UID/1002]

(Wed Apr  5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0100): Requesting info for [1002@LDAP]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [get_dp_name_and_id] (0x0400): Not a LOCAL view, continuing with provided values.
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing request for [0x42960a:1:1002@LDAP]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): Creating request for [LDAP][0x1][BE_REQ_USER][1][idnumber=1002:-]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sbus_add_timeout] (0x2000): 0x1156de0
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): Entering request [0x42960a:1:1002@LDAP]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sbus_remove_timeout] (0x2000): 0x1156de0
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sbus_dispatch] (0x4000): dbus conn: 0x1155b10
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching.
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 0 errno: 0 error message: Success
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0100): Requesting info for [1002@LDAP]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [nss_cmd_getpwuid_search] (0x0400): Returning info for uid [1002@LDAP]
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_ncache_check_str] (0x2000): Checking negative cache for [NCE/USER/LDAP/user2@ldap]

(Wed Apr  5 17:46:52 2017) [sssd[nss]] [fill_pwent] (0x0400): User [user2@ldap] filtered out! (negative cache)
(Wed Apr  5 17:46:52 2017) [sssd[nss]] [sss_dp_req_destructor] (0x0400): Deleting request: [0x42960a:1:1002@LDAP]
18 days ago

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.15.3

18 days ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1440132

18 days ago

Metadata Update from @jhrozek:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1440132

Issue linked to Bugzilla: Bug 1440132

4 days ago

Metadata Update from @fidencio:
- Issue assigned to fidencio

16 hours ago

Metadata Update from @fidencio:
- Custom field patch adjusted to on

Login to comment on this ticket.

on

https://bugzilla.redhat.com/show_bug.cgi?id=1440132

cancel