#626 memberOf performance enhancements
Closed: wontfix 3 years ago by pbrezina. Opened 13 years ago by jhrozek.

These improvements were proposed by Simo on sssd-devel: https://fedorahosted.org/pipermail/sssd-devel/2010-September/004547.html

- If we use rfc2307bis w/o nested groups we can count the number of members on the group entry and switch to a full user enumeration if the number of members is greater than a defined (potentially user defined) threshold. Assuming a threshold value of 100, if we have 10 members we just do 10 lookups, while, if we have > 100 member we do a full enumeration (it's a single ldap search, and if modifyTimestamp is used also highly optimized after the first search) so we are sure all users are there with the right name.

- If we use rfc2307bis with nested groups we do the same as in above, but we add an internal user lookup against the cache we just populated to determine if some members are groups that need further lookups in order to complete the group unrolling operation. Further optimization that was proposed by Ralf Haferkamp was to  not do an extra cache lookup, but rather addtional LDAP lookups right when you receive the entries of the group members. Provided the correct LDAP attributes were requested (i.e. user attributes + groups attributes).

- If we use IPA (rfc2307bis + nested groups + unrolled members in memberOf attribute), we can do a single search with memberOf=groupDN, as IPa's memberOf fully resolves nested groups. This would return all and only the users members of the group. So if the group has 101 members in a directory of 50k users we do not do a full 50k users enumeration just because we have a group that is 1 above the threshold.

Fields changed

summary: enumeration improvements for RFC 2307 => enumeration improvements for RFC 2307bis

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.2.4

Fields changed

owner: somebody => simo

Fields changed

status: new => assigned

Resolving groups with rfc2307bis.

rfc2307bis schema has the advantage of allowing to nest groups because members
are generic DNs. On the flip side it requires more lookups because the DN does
not, in itself, give indication of whther a member is a user or a groupand
what is its name (unless DN decomposition is used and strict rules about how
the DN is composed anre enforced).

With nesting the problem is more complex, without nesting you have basically
the rfc2307 problem as you will ignore any member groups (all is needed is to
lookup users).

Within the rfc2307bis problem we have 3 sub-cases.

1. "Full" memberof information, this means the memberof attribute is
"unfolded", the user has an entry for each group it is directly or indirectly\
memberof (IPA has full memberof info).

2. "simple" memberof information, memberof is just a mirror of the member
attribute, only direct membership in groups is represented in the user.

3. "no" memberof information. Self explanatory.

Solving each case is progressively more complicated and requires more lookups:

1. (Full Memberof) In the first case all is needed to get a list of all members
of a group, including indirect members is to make a single search throughout
the directory for any entry with the a memberof attribute matching the group's
DN. In our case, because we allow to have distinct users and groups search base
DNs we might need to searches, one for users and one for groups.
Once the query returns we are guaranteed by the server that we have all entries
that are relevant for the group.
So the procedure in this case is simply:
a) search group by name (LDAP op)
b) extract the original DN (groupOrigDN) from the results
c) search the local database to see if all members are already available
   (search by originalDN).
 c.1) if all members are available and they are all users, we are done.
 c.1.1) save the group. STOP
 c.2) if not all members are available, or if some of them are groups we need
      to refresh the DB in order to fetch the new memberships
d) search all users that have memberof=<groupOrigDN>
e) store all users
f) search all groups that have memberof=<groupOrigDN>
   (this can be collapsed into d in some cases, but e and g MUST still be
   separate steps, as we need ALL users in the DB before we can start saving
   groups, so to avoid missing memberships)
g) store all groups. STOP

2. (simple memberof) The steps are similar to point 1 except that we do not
have a complete set of results after we search for groups with the memberof
attribute. So after c) we enter this loop:
dd) save list of members we will have to retrieve (not cached ones)
ee) search all objects that have memberof=<groupOrigDN>
ff) for each group recurse this procedure as if it were the group we initially
    were looking for.
    decrease the nesting number so that we stop recursing when the maximum
    nesting level is reached
gg) when nesting == 0, save all users.
hh) then save all groups. STOP

This case requires that we keep a fast access cache of the groups we have
already retrieved so that we do not waste queries by searching the same group
more than once.
Also to avoid unnecessary queries for users it is necessary to keep a fast
access cachce withe the list of users that we already have in the DB or that
we have previously retrieved. If, while looking into the members of a group
we realize all users (and potentially all nested groups) are already in the
list we can consider the job for that group finished.
This will allow saving many queries in pathological or otherwise very
complicated group hierarchies where the same users are members of multiple
groups that all end up being member of a root group.

Unfortunately even doing this will not allow to avoid doing a memberof search
for users, which will return results we already have retrieved. In that case
all we can do is to just throw away duplicates to save memory and continue the
loop.

3. (no memberof) This is the worst case in some situations (group with a lot of
users).
The same rules as for 2 applies, only we need to issue multiple base searches
for each group member instead of a single memberof search for all members.
Using some heuristics, if we determine that we are going to make too many LDAP
operations, a full *user* enumeration can be made upfront in order to reduce
the searches only to the individual number of nested groups part of the
hierarchy. This will reduce the problem to something similar to case 2.
(actually case 2 could use this same technique in some extreme cases, although
it is difficult to calibrate heurstics appropriately, as in some cases a full
enumeration can be very expensive on its own).

How do we know if the memberOf contains the full list or just direct group memberships, i.e. how we differentiate cases 1 & 2? Is this something that requires admin to pre-configure and tell SSSD or it can be determined dynamically at runtime?

For now I am going with explicit configuration.
Technically we can autodetect that as soon as we get the results from meberof. If number of memberof entries > number of group member attributes then we SHOULD be in situation 1.
But, because there are no transactions available over LDAP we cannot be 100% sure, we could be racing with an admin just adding members to the group we retrieved with the previous LDAP operation.

Simo.

Fields changed

owner: simo => sgallagh
status: assigned => new

Fields changed

status: new => assigned

Moving this to NEEDS_TRIAGE.

SSSD 1.2.4 implemented a subset of this bug (the general case) but this ticket should be converted into an enhancement to improve the performance when memberOf is available.

component: SSSD => LDAP Provider
milestone: SSSD 1.2.4 => NEEDS_TRIAGE
type: defect => enhancement

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.5.0

Fields changed

priority: major => critical

Fields changed

priority: critical => minor

Moving back to NEEDS_TRIAGE.

I'm proposing that this performance enhancement should be deferred to 1.6.0, since getting it wrong has the potential for causing serious regressions.

milestone: SSSD 1.5.0 => NEEDS_TRIAGE
status: assigned => new

Replying to [comment:14 sgallagh]:

Moving back to NEEDS_TRIAGE.

I'm proposing that this performance enhancement should be deferred to 1.6.0, since getting it wrong has the potential for causing serious regressions.

Agree.

Do this one early in 1.6.0 to have time to fix any regressions before release.

milestone: NEEDS_TRIAGE => SSSD 1.6.0
priority: minor => critical

Fields changed

coverity: =>
summary: enumeration improvements for RFC 2307bis => MemeberOf performance enhancements
upgrade: => 0

Fields changed

milestone: SSSD 1.6.0 => SSSD 1.7.0

Fields changed

summary: MemeberOf performance enhancements => memberOf performance enhancements

Fields changed

milestone: SSSD 1.7.0 => SSSD 1.6.1
patch: => 0

The performance enhancements gained by the support of ASQ and DEREF have made this enhancement unnecessary.

milestone: SSSD 1.6.1 => SSSD Deferred
rhbz: =>

Fields changed

rhbz: => 0

Fields changed

blockedby: =>
blocking: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
priority: critical => major
selected: =>

Metadata Update from @jhrozek:
- Issue assigned to sgallagh
- Issue set to the milestone: SSSD Patches welcome

7 years ago

Thank you for taking time to submit this request for SSSD. Unfortunately this issue was not given priority and the team lacks the capacity to work on it at this time.

Given that we are unable to fulfill this request I am closing the issue as wontfix.

If the issue still persist on recent SSSD you can request re-consideration of this decision by reopening this issue. Please provide additional technical details about its importance to you.

Thank you for understanding.

Metadata Update from @pbrezina:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1668

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata