#4010 group membership not updating
Closed: wontfix a year ago by pbrezina. Opened 2 years ago by tobyblake.

Hi,

As requested on sssd-users, I'm putting in a new ticket for this. I think it may be the same issue as reported in:

https://pagure.io/SSSD/sssd/issue/3869
https://pagure.io/SSSD/sssd/issue/3886

We are seeing an issue on a large minority of our machines - possibly around a third out of hundreds of deployed hosts - where group memberships stop updating on clients, i.e. the output of getent group <groupname> is out of date. The only fix is to stop sssd, remove the cache file and then start sssd again.

This only affects groups - we also use sssd for netgroup and passwd and they both seem fine.

Our back ends are openldap servers and our groups use posixGroup object class.

OS is Scientific Linux 7.5
sssd version is 1.16.2-13.el7

Please let me know what further information you would like to see - i.e. what logs and what debug level. Also if you want to see the contents of the ldb caches. I will need to anonymise all such information.

Cheers
Toby


I would like to ask for the information as described here:
https://docs.pagure.org/SSSD.sssd/users/troubleshooting.html

It would be best to pinpoint the issue in the logs as close as possible. e.g. which group was reporting wrong members, when did the issue happen so we can correlate the logs with the timestamp etc.

Hi there,

I've attached the log file (debug level 6), from the moment of systemctl sssd start up to and including a getent group groupname. I've also attached the group record from the cache. And the result of a getent on a working machine (bolt) and the non-working machine (dilley)

All names have been anonymised. The group should have 5 members. 'newuser' is the one not being returned.

Let me know if there's anything else you need.

Cheers
Toby

getent-group-groupnameldbsearch-groupnamesssd_INF.log.anon

Hi,

can you add the cache entries for user2@inf, user4@inf and newuser@inf as well?

As you can see from the group cache entry all displayed members are either listen in ghost or memberuid which is expected, but newuser is missing. I'd like to understand if newuser@ inf is already in the cache or not and if it is what might be the difference in the cached entries.

bye,
Sumit

Hi Sumit,

Thanks for your reply. I can confirm that all of user2@inf, user4@inf and newuser@inf have entries in the cache (see attached). user2 and user4 both have memberof entries for the group:

memberof: name=groupname@inf,cn=groups,cn=INF,cn=sysdb

... whereas newuser@inf does not.

Cheers
Toby

user-cache-entries.anon

Hi,

the missing memberOf is kind of expected because memberOf and memberuid should be set in the same run for the two objects.

Would it be possible to send the output with all attributes of the three users? You can of course sanitize text values but it would be helpful if you can keep numerical values like timestamps unmodified.

bye,
Sumit

Hi,

I've attached the complete cache output for all three users. I have replaced any potential identifiers with the text 'ANON'.

Cheers
Toby

user-cache-entries-complete.anon

Hi,

thank you for the data, unfortunately I still cannot reproduce the issue. Do I understand correctly that if you add a new member of an existing group on the server and wait on the client until the cache entry for this group is expired or call sss_cache -E the new member is still not shown?

Would it be possible to add a debug log file similar to the one above but with debug_level=9 in the [domain/...] section of sssd.conf?

bye,
Sumit

Hi,

thank you for the data, unfortunately I still cannot reproduce the issue. Do I understand correctly that if you add a new member of an existing group on the server and wait on the client until the cache entry for this group is expired or call sss_cache -E the new member is still not shown?

That is what i thought the case to be - I had thought that once a machine got into this state, then all group memberships would fail to update. It appears that this isn't the case.

I tested adding a user (me - username 'toby') to a different group, but a group which isn't updating on this machine. I did this after running sss_cache -g <group> and then sss_cache -E for good measure. Username 'toby' has been successfully added to this group. The group is not up to date, however - 2 users who were previously added are not in the group membership.

Would it be possible to add a debug log file similar to the one above but with debug_level=9 in the [domain/...] section of sssd.conf?

Sure, I can do this, but under what conditions - e.g. after an sssd restart? And with what activity - i.e. a group lookup of a known broken group - the original one with 5 members, as in my initial ticket?

Toby

Hi Sumit,

An interesting update to my previous mail. I said that username 'toby' had successfully been added to a group which has not been updating correctly. 'toby' now does not show in the membership via getent group.

I wonder if this is what has been happening in all cases - i.e. membership is briefly updated, and then reverts. I'll see if I can find anything in the logs.

Toby

'toby' now does not show in the membership via getent group.

And now it's back again. I'm somewhat at a loss as to how to debug this. Can you suggest an approach?

Toby

Hi,

about the log with debug_level=9, for a start the same setup as for the original log, start SSSD and the getent group groupname, maybe sss_cache -E to make sure the cached entry is expired and a the group is looked up on the LDAP server.

About your user coming and going from the group. There are 2 way a user might be added to a group. Either by getent group groupname or by id username. I guess since you are using your user that some processes on the system will do id username or equivalents which might add the user back to the group.

bye,
Sumit

Hi,

OK, attached is log (debug level 9 for domain) following:

systemctl stop sssd
systemctl start sssd
sss_cache -E
getent group groupname

Toby

dilley-sssd_INF.log.anon

Hi there, is there any more information I can provide to help with debugging this? it's an ongoing issue.

Toby

I'm sorry this is taking so long. But unfortunately I can't reproduce the behaviour either.

Could you please do one more experiment for me? Before updating the group, could you run an ldapsearch for the group object and record the modifyTimestamp attribute value. Then, run ldbsearch for both local sssd caches (/var/lib/sss/db/cache_domain and /var/lib/sss/db/timestamps_domain). Again, I'm mostly interested in the modifyTimestamp value. Finally, add the member, and record the server side modifyTimestamp again. Did it change on the server side? Expire the cache or wait for it to expire, getent group and check the modifyTimestamp in the caches again.

Hi, thanks for getting back to me on this (and also my apologies for
the delay in replying).

I've attempted to gather the information you requested. Unfortunately
I do have to protect the names of our users, so will have to obfuscate
some of the data. In this case the group is the same name as a user
on our systems (who should be in the group, but is missing), but I
have chosen this group as in our LDAP it has only two members and
getent on the machine in question is reporting only one.

I will obfuscate as follows.

Group name: wXXXXy
Username1: sXXXXX2
Username2: wXXXXy

Group membership should report:

wXXXXy:*:12345:wXXXXy,sXXXXX2

... but on a broken machine, it reports:

wXXXXy:*:12345:sXXXXX2

Here is the information you request:

ldapsearch of group record:

wXXXXy, Group, inf.ed.ac.uk

dn: cn=wXXXXy,ou=Group,BASE
gidNumber: 12345
cn: wXXXXy
objectClass: top
objectClass: posixGroup
structuralObjectClass: posixGroup
createTimestamp: 20180712144017Z
memberUid: wXXXXy
memberUid: sXXXXX2
entryCSN: 20190909091634.670535Z#000000#000#000000
modifyTimestamp: 20190909091634Z
entryDN: cn=wXXXXy,ou=Group,BASE
subschemaSubentry: cn=Subschema
hasSubordinates: FALSE

Here is what it looks like in cache_INF.ldb:

record 1

dn: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb
createTimestamp: 1557846478
gidNumber: 12345
name: wXXXXy@inf
objectCategory: group
isPosix: TRUE
originalDN: cn=wXXXXy,ou=Group,BASE
member: name=wXXXXy@inf,cn=users,cn=INF,cn=sysdb
nameAlias: wXXXXy@inf
originalModifyTimestamp: 20190909091634Z
entryUSN: 20190909091634Z
ghost: sXXXXX2@inf
lastUpdate: 1568110621
dataExpireTimestamp: 1568112421
distinguishedName: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb

There is no record in timestamps_INF.ldb

... after expiring cache...

Next, I update the group to add myself to it (I won't bother
obfuscating my username 'toby').

Here's the ldap record:

wXXXXy, Group, inf.ed.ac.uk

dn: cn=wXXXXy,ou=Group,BASE
gidNumber: 12345
cn: wXXXXy
objectClass: top
objectClass: posixGroup
structuralObjectClass: posixGroup
createTimestamp: 20180712144017Z
memberUid: wXXXXy
memberUid: sXXXXX2
memberUid: toby
entryCSN: 20190910134617.434872Z#000000#000#000000
modifyTimestamp: 20190910134617Z
entryDN: cn=wXXXXy,ou=Group,BASE
subschemaSubentry: cn=Subschema
hasSubordinates: FALSE

getent reports the added user successfully (but still not the original
missing one):

[dilley]root: getent group wXXXXy
wXXXXy:*:12345:toby,sXXXXX2
[dilley]root:

And, here is the record in cache_INF.ldb:

record 1

dn: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb
createTimestamp: 1557846478
gidNumber: 12345
name: wXXXXy@inf
objectCategory: group
isPosix: TRUE
originalDN: cn=wXXXXy,ou=Group,BASE
nameAlias: wXXXXy@inf
ghost: sXXXXX2@inf
originalModifyTimestamp: 20190910134617Z
entryUSN: 20190910134617Z
member: name=wXXXXy@inf,cn=users,cn=INF,cn=sysdb
member: name=toby@inf,cn=users,cn=INF,cn=sysdb
lastUpdate: 1568123725
dataExpireTimestamp: 1568125525
memberuid: toby@inf
distinguishedName: name=wXXXXy@inf,cn=groups,cn=INF,cn=sysdb

Again, there is no group record in timestamps_INF.ldb

I suppose what leaps out for me here is that the user who does appear
in the getent output (sXXXXX2) is only in the group cache record as
'ghost', the user who doesn't appear in getent output is only in as
'member', whereas the new record has both 'memberuid' and 'member'.

Metadata Update from @pbrezina:
- Issue tagged with: Canditate to close

a year ago

Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.

Given that we are unable to fulfill this request I am closing the issue as wontfix.

If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.

Thank you for understanding.

Metadata Update from @pbrezina:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

a year ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4981

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata
Attachments 4