#608 Race-condition in user/group enumeration
Closed: Fixed None Opened 10 years ago by sgallagh.

While working on Ticket #358 (netgroups support) I discovered that there's a race-condition present in the enumeration code for users and groups. I will use {{{grent}}} as the example, but this applies equally to {{{pwent}}}.

Right now, when we get a {{{setgrent}}} call in the NSS responder, we refresh the cache and prepare the response data for future {{{getgrent}} calls. However, we attach this response data to a global context.

If a second enumeration request is made after the initial {{{setgrent}}} request, then it will detect that the data is already available may start returning values from the existing result set. This can mean that both enumerations will end up with an incomplete subset of the total enumeration result.

We need to make sure that the global context maintains a reference count of the number of client connections reading from the result set. We also need to move the index counter for the {{{getgrent}}} requests out to the client context (instead of the global context) so that each client will maintain its own tracking of where it is in the result set.

In order to prevent the cache from getting stale due to a malicious or poorly-design client, we need to implement a timeout (configurable) for how long to wait for a client to complete it's reading of the enumeration. If the client does not call {{{endgrent}}} within this time, we should forcibly remove its reference to the object and return an EIO error if the client ever tries to continue after that time.

Instead of returning an error I would rather refresh the cache and try to return results starting from the position counteras set in the client context.
If we are lucky and no accounts were changed it will just keep returning the right data, otherwise at most it will fail to return some entries and possibly even return that there no more entries (if the position counter goes beyond the number of currently available entries).

I think this would be a better behaviour than returning EIO to an unlucky slow process, byu slow I am thinking for example of a process reading something via NFS while enumerating users, and NFS being unbearably slow. Not the process' fault.


I attached a simple C file to demonstrate this bug. It will fork two processes, each attempting to perform an enumeration simultaneously.

The expected behaviour is that the parent and child lists will be identical. Right now, they are not. Right now, they both receive a subset of the complete list.

This program can be used to verify the fix when it is complete.

gcc -o getent_test getent_test.c
./getent_test 2>&1|grep Parent|wc -l; ./getent_test 2>&1|grep Child|wc -l

That second set of commands should output the same number twice on every run.

Fields changed

status: new => assigned

Source code for a program to reproduce the bug and verify the fix

Fixed by c53ed27

fixedin: => 1.4.0
resolution: => fixed
status: assigned => closed

Fields changed

rhbz: => 0

Metadata Update from @sgallagh:
- Issue assigned to sgallagh
- Issue set to the milestone: SSSD 1.4.0

3 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1650

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.