#56 Add the design page about Using the Global Catalog to speed up lookups by UID
Merged 6 years ago by jhrozek. Opened 6 years ago by jhrozek.
SSSD/ jhrozek/docs gc  into  master

file modified
+1
@@ -22,6 +22,7 @@ 

     :maxdepth: 1

  

     auto_private_groups

+    uid_negative_global_catalog

  

  

  Implemented in 1.15.x

@@ -0,0 +1,224 @@ 

+ .. highlight:: none

+ 

+ Using the Global Catalog to speed up lookups by ID

+ ==================================================

+ 

+ Related ticket(s):

+ ------------------

+      https://pagure.io/SSSD/sssd/issue/3468

+ 

+ Problem statement

+ -----------------

+ When SSSD is connected to a forest with multiple domains, each lookup,

+ unless qualified with the domain name, iterates over all the domains.

+ Moreover, some lookups, such as by-ID cannot be qualified using the

+ NSS interface at all.

+ 

+ This means the SSSD will issue N LDAP searches for N domains. If

+ the object SSSD is searching for exists in the LDAP database in one of the

+ domains, the performance impact can be mitigated with the already existing

+ option ``cache_first``, which will, even for non-qualified searches, first

+ check if the requested object exists in the local database and if it does,

+ searches the corresponding domain only.

+ 

+ But this option doesn't solve the problem of looking for objects, especially

+ numerical IDs, that do not exist in the remote database at all. A search for

+ such non-existent object will always traverse all the domains every time the

+ negative cache from a previous request expires.

+ 

+ In environments that use the Global Catalog, this issue can be mitigated

+ by locating the object's domain in the Global Catalog, provided that the

+ search key is present in the Global Catalog in the first place.

+ 

+ Use-cases

+ ---------

+ Currently the primary use-case is SSSD joined to an AD forest consisting of

+ multiple domains and configured with ``id_provider=ad``, because only the AD

+ provider supports Global Catalog lookups. There are some plans to implement

+ the Global Catalog e.g. for FreeIPA, but so far no implementation exists.

+ 

+ At the same time, only environment that use POSIX UID and GID attributes set

+ by the administrator will benefit from this enhancement, becase if the client

+ maps the IDs algorithmically from the SIDs, the AD provider is already able

+ to shortcut the by-ID request after computing the SID from the requested

+ ID and realizing that the domain SID does not come from the current domain.

+ 

+ The current state of Global Catalog support in SSSD

+ ---------------------------------------------------

+ The Global Catalog is an LDAP database, which contains a subset of attributes

+ about objects from all the domains in the whole forest. What attributes

+ are replicated to the Global Catalog is defined by the `Partial Attribute Set <https://social.technet.microsoft.com/wiki/contents/articles/23097.active-directory-attributes-in-the-partial-attribute-set.aspx>`_.

+ It is possible to query for the attributes

+ that are replicated to the Global Catalog using an LDAP query based in

+ the ``cn=schema,cn=configuration`` subtree and check for the presence of

+ ``isMemberOfPartialAttributeSet=TRUE``, for example::

+ 

+     ldapsearch -Y GSSAPI \

+                -H ldap://dc.win.trust.test:389 \

+                -b cn=schema,cn=configuration,dc=win,dc=trust,dc=test \

+                '(&(objectClass=attributeSchema)(isMemberOfPartialAttributeSet=TRUE))'

+ 

+ It is important to note that because the POSIX attributes such as

+ ``uidNumber`` or ``gidNumber`` are neither part of the default Active

+ Directory schema, nor replicated to the Global Catalog by default.

+ To learn how to extend the schema to set the POSIX attributes at all,

+ follow the `Install Identity Management for UNIX Components <https://technet.microsoft.com/en-us/library/cc731178.aspx>`_

+ article on the Microsoft TechNet site. How to extend the Partial Attribute Set

+ is described for example in the `AD DS: Global Catalogs and the Partial Attribute Set <https://blogs.technet.microsoft.com/scotts-it-blog/2015/02/28/ad-ds-global-catalogs-and-the-partial-attribute-set/>`_

+ TechNet blog post.

+ 

+ The purpose of using the Global Catalog in SSSD is two-fold:

+ 

+  * to avoid having to connect to the LDAP server of a DC from every domain in the forest

+ 

+  * to look up the cross-domain members of Universal Groups, which are only present in the Global Catalog

+ 

+ Because not all the attributes required by SSSD are guaranteed to be

+ replicated to the Global Catalog (especially the ``uidNumber`` and

+ ``gidNumber`` attributes), SSSD runs a search that checks for

+ the presence of any objects with either ``uidNumber`` or ``gidNumber``

+ during the very first request for a numerical ID. If no objects with

+ either attribute are present, the Global Catalog support is disabled

+ except for looking up Universal Group members.

+ 

+ However, at the moment, SSSD will either use whole entry it finds in

+ the Global Catalog or not use the Global Catalog at all. This puts

+ a bit of responsibility on the administrator in the sense that the

+ object in the Global Catalog must contain all the required entries or

+ the administrator might need to disable the Global Catalog support

+ manually in the configuration file.  In the future (see e.g. ticket

+ `3538 RFE: Use the global catalog only to look up the entry DN

+ <https://pagure.io/SSSD/sssd/issue/3538>`_) we would like to change the

+ logic so that it uses the Global Catalog to look up the entry DN, but

+ then it would look up the entry attributes in the LDAP directory of the

+ object's domain. However, that enhancement is out of scope of what this

+ design page describes.

+ 

+ Overview of the solution

+ ------------------------

+ A new Data Provider method ``getAccountDomain()`` whose purpose is to locate

+ a domain an object resides in will be added. At the moment, only the AD

+ provider will implement this handler.

+ 

+ The responder's ``cache_req`` module will call this handler before iterating

+ over domains. For all domains except the one returned from the handler,

+ the ``cache_req`` module will set the requested object into negative cache.

+ This would cause the subsequent loops over the domains to just skip the

+ domains where the entry was not found and only look up the entry in the

+ domain that the ``getAccountDomain()`` method returned.

+ 

+ Implementation details

+ ----------------------

+ There are two parts to the implementation - the responder side, which mostly

+ touches the ``cache_req`` code and the provider side. The responder side

+ would also require adding some API to the negative cache module.

+ 

+ Responder changes - cache_req and negative cache

+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+ On the responder side, the ability to locate a domain of a requested object

+ will be provided by new ``cache_req`` plugin methods. Not all plugins will

+ be augmented with the methods that call the domain locator - at least in

+ the first iteration, only the plugins that search objects by ID will use

+ the new Data Provider API.

+ 

+ When looking up an entry, the ``cache_req`` request must first decide

+ whether it is worth calling the domain locator request at all. The locator

+ request should only be called when there are multiple domains to search

+ and the request is not already qualified with a domain name. Similarly,

+ the domain locator should not be called if the request is only evaluating

+ the cached data (``bypass_dp=True``, which is typically set during the

+ first pass when the ``cache_first`` option is enabled). Of course, the

+ locator would also only be called for plugins that implement the associated

+ methods.

+ 

+ When all the above evaluates into calling the locator (e.g. searching

+ a user UID while multiple domains are defined), the first step before

+ actually calling the locator DP method should still be looking into the

+ cache. This additional step ensures that looking up an ID from the first

+ defined domain in a setup with many domains wouldn't needlessly hit the

+ Global Catalog, while the entry is still cached in sysdb.

+ 

+ Finally, the responder would call the ``getAccountDomain`` Data Provider

+ method. If calling the DP method returns an error, this error is in no way

+ fatal, but instead, the ``cache_req`` code resumes the original codepath

+ where all domains are searched sequentially. One error code that signifies

+ that the back end as a whole doesn't support locating ID's domain must be

+ added. When the ``cache_req`` code would receive this error code, it

+ would never call the domain locator again for this domain.

+ 

+ On returning success from the ``getAccountDomain`` method, the string

+ returned from the method will contain the domain where the ID was found.

+ Only one domain can be returned, conflicting values in the ID space will

+ be detected on the provider side and handled by returning an error, which

+ will fall back to the sequential lookups.

+ 

+ The returned domain name will be used to set a negative cache entry for

+ the looked up object in all domains except the one that was returned.

+ It is important to only mark (sub)domains that belong to the same "main"

+ domain with these negative cache entries, especially because internally

+ in the ``cache_req`` code, we use a flattened domain list to iterate over

+ in order to support custom domain lookup priorities. After this is done,

+ the ``cache_req`` code would loop back into its original logic, but the

+ negative cache entries will ensure that domains that do not contain this

+ ID are skipped.

+ 

+ Because the loop over domains is resumed only after the locator was called,

+ there needs to be a way to avoid calling the locator too often. To this end,

+ a new negative cache container would be added. Under this container, we will

+ store the values of the objects we look up to notify the ``cache_req`` code

+ that either the locator must be called again or that calling the locator

+ can be skipped this time and the per-domain-per-ID negative cache entries

+ can be reused again during the loop over domains.

+ 

+ Provider changes - the ``getAccountDomain`` implementation

+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+ All providers except ``id_provider=ad`` will set a dummy ``getAccountDomain``

+ handler which always returns ``ERR_GET_ACCT_DOM_NOT_SUPPORTED``. Therefore,

+ for all domains except the ones with the AD provider, the

+ ``getAccountDomain`` method will only be called once and then disabled.

+ 

+ The AD provider implementation of the ``getAccountDomain`` method will

+ search the Global Catalog with an empty search base, thus searching across

+ all the domains in the forest. Two details are important to bring up with

+ respect to this search:

+ 

+     * In order for this lookup to be useful even for non-existant IDs,

+       the Global Catalog search must be "authoritative". In other words,

+       not finding the entry in the Global Catalog must be considered as if

+       the entry doesn't exist.

+ 

+     * Because the POSIX IDs are not replicated by default to the Global

+       Catalog, the ``getAccountDomain`` request must check if any POSIX

+       IDs at all are replicated to the Global Catalog at all.

+ 

+ 

+ Configuration changes

+ ---------------------

+ None. However, it should be noted that disabling the Global Catalog support

+ as a whole in SSSD would disable the ``getAccountDomain`` in the sense that

+ it would always return ``ERR_GET_ACCT_DOM_NOT_SUPPORTED`` which would in turn

+ instruct the responder to never call the ``getAccountDomain`` request again

+ 

+ Therefore, disabling the Global Catalog can be used to disable this

+ new functionality.

+ 

+ How To Test

+ -----------

+ To test the functionality itself, an AD forest with multiple domains should

+ be used. Please make sure the POSIX attributes are present and replicated

+ to the Global Catalog. Requesting a POSIX ID from domain outside the joined

+ one should first consult the Global Catalog and then proceed to only searching

+ the individual domain where the ID was located.

+ 

+ It is important to test that there are no regressions in setups that either

+ do not use POSIX IDs at all or do not replicate the POSIX IDs to the Global

+ Catalog. In these setups, as well as configurations that use a different ID

+ provider, the ``cache_req`` code must only attempt to call the locator once.

+ 

+ Similarly, setups that use multiple domains (and remember that since

+ Fedora-26, all SSSD installations automatically enable the ``files``

+ provider) must see no regressions.

+ 

+ Authors

+ -------

+  * Jakub Hrozek ``<jhrozek@redhat.com>``

rebased onto 4fe0d9d

6 years ago

Since this PR was open for a month now with no comments and the code was merged, I'm going to push the design page..

Pull-Request has been merged by jhrozek

6 years ago