4cd1a24 Ticket 49372 - filter optimisation improvements for common queries

Authored and Committed by William Brown 6 years ago
    Ticket 49372 - filter optimisation improvements for common queries
    
    Bug Description:  Due to the way we apply indexes to searches
    and the presence of the "filter test threshold" there are a number
    of queries which can be made faster if they understood the internals
    of our idl_set and index mechanisms. However, instead of expecting
    application authors to do this, we should provide it.
    
    Fix Description:  In the server we have some cases we want to
    achieve, and some to avoid:
    
    * If a union has an unindexed candidate, we throw away all work
      and return an ALLIDS idls.
    * In an intersection, if we have an idl that is less than
      filter test threshold, we return immediately that idl
      rather than accessing all others, and perform a filter
      test.
    
    Knowing these two properties, we can now look at improving filters
    for queries.
    
    In a common case, SSSD will give us a query which is a union of
    host cn and sudoHost rules. However, the sudoHost rules are
    substring searchs that are not able to be indexed - thus the whole
    filter becomes an unindexed search. IE:
    
    (|(cn=a)(cn=b)(cn= ....)(sudoHost=[*]*))
    
    So in this case we want to move the substring to the first query
    so that if it's un-indexed, we fail immediately with ALLIDS rather
    than opening the cn index.
    
    For intersection, we often see:
    
    (&(objectClass=account)(objectClass=posixAccount)(uid=william))
    
    The issue here is that the idls for account and posixAccount both
    may contain 100,000 items. Even with idl lookthrough limits, until
    we start to read these, we don't know if we will exceed that.
    
    A better query is:
    
    (&(uid=william)(objectClass=account)(objectClass=posixAccount))
    
    Because the uid=william index will contain a single item, this
    put's us below filter test threshold, and we will not open the
    objectClass indexes.
    
    In fact, in an intersection, it is almost always better to perform
    simple equalities first:
    
    (&(uid=william)(modifyTimestamp>=...)(sn=br*)(objectClass=posixAccount))
    
    In most other cases, we will not greatly benefit from re-arrangement
    due to the size of the idls involved we won't hit filter test. IE
    
    (&(modifyTimestamp>=...)(sn=br*)(objectClass=posixAccount))
    
    Would not be significantly better despite and possible arrangement
    without knowing the content of sn.
    
    So in summary, our rules for improving queries are:
    
    * unions-with-substrings should have substrings *first*
    * intersection-with-equality should have all non-objectclass
      equality filters *first*.
    
    https://pagure.io/389-ds-base/issue/49372
    
    Author: wibrown
    
    Review by: lkrispen, mreynolds (Thanks!)
    
        
file modified
+7 -0
file modified
+170 -33