Learn more about these different git repos.
Other Git URLs
Hi,
As requested on sssd-users, I'm putting in a new ticket for this. I think it may be the same issue as reported in:
https://pagure.io/SSSD/sssd/issue/3869 https://pagure.io/SSSD/sssd/issue/3886
We are seeing an issue on a large minority of our machines - possibly around a third out of hundreds of deployed hosts - where group memberships stop updating on clients, i.e. the output of getent group <groupname> is out of date. The only fix is to stop sssd, remove the cache file and then start sssd again.
This only affects groups - we also use sssd for netgroup and passwd and they both seem fine.
Our back ends are openldap servers and our groups use posixGroup object class.
OS is Scientific Linux 7.5 sssd version is 1.16.2-13.el7
Please let me know what further information you would like to see - i.e. what logs and what debug level. Also if you want to see the contents of the ldb caches. I will need to anonymise all such information.
Cheers Toby
I would like to ask for the information as described here: https://docs.pagure.org/SSSD.sssd/users/troubleshooting.html
It would be best to pinpoint the issue in the logs as close as possible. e.g. which group was reporting wrong members, when did the issue happen so we can correlate the logs with the timestamp etc.
Hi there,
I've attached the log file (debug level 6), from the moment of systemctl sssd start up to and including a getent group groupname. I've also attached the group record from the cache. And the result of a getent on a working machine (bolt) and the non-working machine (dilley)
All names have been anonymised. The group should have 5 members. 'newuser' is the one not being returned.
Let me know if there's anything else you need.
<img alt="getent-group-groupname" src="/SSSD/sssd/issue/raw/files/7b26ea7d0153dd4fbc3ec5e7267c28fd1e88e3ac942e9c300f697681b4821b52-getent-group-groupname" /><img alt="ldbsearch-groupname" src="/SSSD/sssd/issue/raw/files/d24d7e639e5c85c2bc105fb9b62fb9864a7f3da50049e063502e5daa8f3b06f0-ldbsearch-groupname" /><img alt="sssd_INF.log.anon" src="/SSSD/sssd/issue/raw/files/475939e2264ef755aaccd78caac49bb847660b2c4dc11a1683250994206772d4-sssd_INF.log.anon" />
can you add the cache entries for user2@inf, user4@inf and newuser@inf as well?
As you can see from the group cache entry all displayed members are either listen in ghost or memberuid which is expected, but newuser is missing. I'd like to understand if newuser@ inf is already in the cache or not and if it is what might be the difference in the cached entries.
ghost
memberuid
bye, Sumit
Hi Sumit,
Thanks for your reply. I can confirm that all of user2@inf, user4@inf and newuser@inf have entries in the cache (see attached). user2 and user4 both have memberof entries for the group:
memberof: name=groupname@inf,cn=groups,cn=INF,cn=sysdb
... whereas newuser@inf does not.
<img alt="user-cache-entries.anon" src="/SSSD/sssd/issue/raw/files/3837c5cee65efb133e74038656c5ed2d2abdde278203af8087161835014c123c-user-cache-entries.anon" />
the missing memberOf is kind of expected because memberOf and memberuid should be set in the same run for the two objects.
Would it be possible to send the output with all attributes of the three users? You can of course sanitize text values but it would be helpful if you can keep numerical values like timestamps unmodified.
I've attached the complete cache output for all three users. I have replaced any potential identifiers with the text 'ANON'.
<img alt="user-cache-entries-complete.anon" src="/SSSD/sssd/issue/raw/files/a45a7c1e946bf2d4e0d0c1f7568570bdb6804e12dde0ede8c7fde94796cf36dd-user-cache-entries-complete.anon" />
thank you for the data, unfortunately I still cannot reproduce the issue. Do I understand correctly that if you add a new member of an existing group on the server and wait on the client until the cache entry for this group is expired or call sss_cache -E the new member is still not shown?
sss_cache -E
Would it be possible to add a debug log file similar to the one above but with debug_level=9 in the [domain/...] section of sssd.conf?
That is what i thought the case to be - I had thought that once a machine got into this state, then all group memberships would fail to update. It appears that this isn't the case.
I tested adding a user (me - username 'toby') to a different group, but a group which isn't updating on this machine. I did this after running sss_cache -g <group> and then sss_cache -E for good measure. Username 'toby' has been successfully added to this group. The group is not up to date, however - 2 users who were previously added are not in the group membership.
Sure, I can do this, but under what conditions - e.g. after an sssd restart? And with what activity - i.e. a group lookup of a known broken group - the original one with 5 members, as in my initial ticket?
Toby
An interesting update to my previous mail. I said that username 'toby' had successfully been added to a group which has not been updating correctly. 'toby' now does not show in the membership via getent group.
I wonder if this is what has been happening in all cases - i.e. membership is briefly updated, and then reverts. I'll see if I can find anything in the logs.
'toby' now does not show in the membership via getent group.
And now it's back again. I'm somewhat at a loss as to how to debug this. Can you suggest an approach?
about the log with debug_level=9, for a start the same setup as for the original log, start SSSD and the getent group groupname, maybe sss_cache -E to make sure the cached entry is expired and a the group is looked up on the LDAP server.
getent group groupname
About your user coming and going from the group. There are 2 way a user might be added to a group. Either by getent group groupname or by id username. I guess since you are using your user that some processes on the system will do id username or equivalents which might add the user back to the group.
id username
OK, attached is log (debug level 9 for domain) following:
systemctl stop sssd systemctl start sssd sss_cache -E getent group groupname
<img alt="dilley-sssd_INF.log.anon" src="/SSSD/sssd/issue/raw/files/544e0bd0624037b68db62806809ee84e5c22563c1d39a1c0e794b12269153f0f-dilley-sssd_INF.log.anon" />
Hi there, is there any more information I can provide to help with debugging this? it's an ongoing issue.
I'm sorry this is taking so long. But unfortunately I can't reproduce the behaviour either.
Could you please do one more experiment for me? Before updating the group, could you run an ldapsearch for the group object and record the modifyTimestamp attribute value. Then, run ldbsearch for both local sssd caches (/var/lib/sss/db/cache_domain and /var/lib/sss/db/timestamps_domain). Again, I'm mostly interested in the modifyTimestamp value. Finally, add the member, and record the server side modifyTimestamp again. Did it change on the server side? Expire the cache or wait for it to expire, getent group and check the modifyTimestamp in the caches again.
/var/lib/sss/db/cache_domain
/var/lib/sss/db/timestamps_domain
Hi, thanks for getting back to me on this (and also my apologies for the delay in replying).
I've attempted to gather the information you requested. Unfortunately I do have to protect the names of our users, so will have to obfuscate some of the data. In this case the group is the same name as a user on our systems (who should be in the group, but is missing), but I have chosen this group as in our LDAP it has only two members and getent on the machine in question is reporting only one.
I will obfuscate as follows.
Group name: wXXXXy Username1: sXXXXX2 Username2: wXXXXy
Group membership should report:
wXXXXy:*:12345:wXXXXy,sXXXXX2
... but on a broken machine, it reports:
wXXXXy:*:12345:sXXXXX2
Here is the information you request:
ldapsearch of group record:
dn: cn=wXXXXy,ou=Group,BASE gidNumber: 12345 cn: wXXXXy objectClass: top objectClass: posixGroup structuralObjectClass: posixGroup createTimestamp: 20180712144017Z memberUid: wXXXXy memberUid: sXXXXX2 entryCSN: 20190909091634.670535Z#000000#000#000000 modifyTimestamp: 20190909091634Z entryDN: cn=wXXXXy,ou=Group,BASE subschemaSubentry: cn=Subschema hasSubordinates: FALSE
Here is what it looks like in cache_INF.ldb:
dn: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb createTimestamp: 1557846478 gidNumber: 12345 name: wXXXXy@inf objectCategory: group isPosix: TRUE originalDN: cn=wXXXXy,ou=Group,BASE member: name=wXXXXy@inf,cn=users,cn=INF,cn=sysdb nameAlias: wXXXXy@inf originalModifyTimestamp: 20190909091634Z entryUSN: 20190909091634Z ghost: sXXXXX2@inf lastUpdate: 1568110621 dataExpireTimestamp: 1568112421 distinguishedName: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb
There is no record in timestamps_INF.ldb
... after expiring cache...
Next, I update the group to add myself to it (I won't bother obfuscating my username 'toby').
Here's the ldap record:
dn: cn=wXXXXy,ou=Group,BASE gidNumber: 12345 cn: wXXXXy objectClass: top objectClass: posixGroup structuralObjectClass: posixGroup createTimestamp: 20180712144017Z memberUid: wXXXXy memberUid: sXXXXX2 memberUid: toby entryCSN: 20190910134617.434872Z#000000#000#000000 modifyTimestamp: 20190910134617Z entryDN: cn=wXXXXy,ou=Group,BASE subschemaSubentry: cn=Subschema hasSubordinates: FALSE
getent reports the added user successfully (but still not the original missing one):
[dilley]root: getent group wXXXXy wXXXXy:*:12345:toby,sXXXXX2 [dilley]root:
And, here is the record in cache_INF.ldb:
dn: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb createTimestamp: 1557846478 gidNumber: 12345 name: wXXXXy@inf objectCategory: group isPosix: TRUE originalDN: cn=wXXXXy,ou=Group,BASE nameAlias: wXXXXy@inf ghost: sXXXXX2@inf originalModifyTimestamp: 20190910134617Z entryUSN: 20190910134617Z member: name=wXXXXy@inf,cn=users,cn=INF,cn=sysdb member: name=toby@inf,cn=users,cn=INF,cn=sysdb lastUpdate: 1568123725 dataExpireTimestamp: 1568125525 memberuid: toby@inf distinguishedName: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb
Again, there is no group record in timestamps_INF.ldb
I suppose what leaps out for me here is that the user who does appear in the getent output (sXXXXX2) is only in the group cache record as 'ghost', the user who doesn't appear in getent output is only in as 'member', whereas the new record has both 'memberuid' and 'member'.
Metadata Update from @pbrezina: - Issue tagged with: Canditate to close
Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.
Given that we are unable to fulfill this request I am closing the issue as wontfix.
If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.
Thank you for understanding.
Metadata Update from @pbrezina: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4981
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.