#7786 Index accessruletype, hostcategory, ipaenabledflag, ipserviceport, and ipserviceprotocol by default
Closed: fixed 5 years ago Opened 5 years ago by jaredl.

Request for enhancement

As an FreeIPA server administration, I expect that the default indexes shipped w/ FreeIPA are optimized for sssd's usage of FreeIPA as a backend.

Issue

Over the last few weeks, I have been working through a horrible performance issue with our production FreeIPA v4.5.4 cluster. The clients are a mix of CentOS 6 and 7 hosts all running sssd with the latest CentOS updates release for the relative platform. Outside of SSSD, we have a small handful of applications using FreeIPA for LDAP authentication.

Working with @mreynolds on Freenode, running logconv.pl --verbose /var/log/dirsrv/slapd-$REALM/access showed that we had a very high number of unindexed searches.

Steps to Reproduce

I'm not entirely sure on the best way to reproduce this issue, production, which sees it the worst, is currently 13,000+ ipa-clients connected to a 16 FreeIPA 4.5.4 server cluster.
1. Install FreeIPA v4.5.4
2. Connect some clients via SSSD
3. Use logconv.pl to see the unindexed searches

Actual behavior

Clients are performing hundreds of unindexed searches every minute, resulting in significant system performance degradation.

Expected behavior

Out of the box, I would expect that the common LDAP searches performed by SSSD would be indexed and that going forward, FreeIPA/SSSD would ensure that any changes to the search pattern would remain indexed if feasible.

Version/Release/Distribution

FreeIPA servers:

# rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server
package freeipa-server is not installed
package freeipa-client is not installed
ipa-server-4.5.4-10.el7.centos.4.4.x86_64
ipa-client-4.5.4-10.el7.centos.4.4.x86_64
389-ds-base-1.3.7.5-28.el7_5.x86_64
pki-ca-10.5.1-15.el7_5.noarch
krb5-server-1.15.1-19.el7.x86_64

CentOS 7 Client:

# rpm -q ipa-client sssd
ipa-client-4.5.4-10.el7.centos.4.4.x86_64
sssd-1.16.0-19.el7_5.8.x86_64

CentOS 6 Client:

# rpm -q ipa-client sssd
ipa-client-3.0.0-51.el6.centos.x86_64
sssd-1.13.3-60.el6.x86_64

Additional info:

We've been able to mititagate the significant performance degradation by executing the following on all of our FreeIPA servers:
1. Create the new indexes

ldapmodify -x -D "cn=Directory Manager" -W <<EOF
dn: cn=ipaenabledflag,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add
objectClass:top
objectClass:nsIndex
nssystemindex: false
nsIndexType: eq
cn: ipaenabledflag

dn: cn=accessruletype,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add
objectClass:top
objectClass:nsIndex
nssystemindex: false
nsIndexType: eq
cn: accessruletype

dn: cn=hostcategory,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add
objectClass:top
objectClass:nsIndex
nssystemindex: false
nsIndexType: eq
cn: hostcategory

dn: cn=ipserviceport,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add
objectClass:top
objectClass:nsIndex
nssystemindex: false
nsIndexType: eq
cn: ipserviceport

dn: cn=ipserviceprotocol,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: add
objectClass:top
objectClass:nsIndex
nssystemindex: false
nsIndexType: eq
cn: ipserviceprotocol

EOF
  1. Reindex userRoot:
db2index.pl -D 'cn=Directory Manager' -w - -n userRoot

These have now been running in production, staging, and development for over a week and the results are incredible. On average, after creating the indexes, we saw: A 15% drop in % system CPU per instance, a 46% drop in 5 minute system load per instance, a 44% drop in normalized CPU usage of the ns-slapd process all while holding the same number of client connections to the database. To help visualize this, I've attached two graphs during the time when I rolled out the indexes. Note that no other system changes were performed, only the indexes were added and userRoot was re-indexed.

max_system-load-5.png
ns-slapd_cpu_normailized.png

Finally, I've also included some of the output from logconv.pl on our access logs before the index creation. There's a TON of output, so this is just a subset of it:

----------- Access Log Output ------------

Start of Logs:    14/Nov/2018:18:29:11.243162816
End of Logs:      14/Nov/2018:19:39:05.712472728

Processed Log Time:  1 Hours, 9 Minutes, 54.46931456 Seconds

Restarts:                     0
Secure Protocol Versions:
  - TLS1.2 256-bit AES-GCM (51 connections)

Peak Concurrent Connections:  36
Total Operations:             833432
Total Results:                843068
Overall Performance:          100.0%

Total Connections:            5296          (1.26/sec)  (75.76/min)
 - LDAP Connections:          5270          (1.26/sec)  (75.38/min)
 - LDAPI Connections:         1             (0.00/sec)  (0.01/min)
 - LDAPS Connections:         25            (0.01/sec)  (0.36/min)
 - StartTLS Extended Ops:     26            (0.01/sec)  (0.37/min)

Searches:                     812785        (193.78/sec)  (11626.52/min)
Modifications:                7734          (1.84/sec)  (110.63/min)
Adds:                         73            (0.02/sec)  (1.04/min)
Deletes:                      8             (0.00/sec)  (0.11/min)
Mod RDNs:                     0             (0.00/sec)  (0.00/min)
Compares:                     0             (0.00/sec)  (0.00/min)
Binds:                        5242          (1.25/sec)  (74.98/min)

Proxied Auth Operations:      0
Persistent Searches:          0
Internal Operations:          0
Entry Operations:             0
Extended Operations:          7109
Abandoned Requests:           49
Smart Referrals Received:     0

VLV Operations:               432
VLV Unindexed Searches:       0
VLV Unindexed Components:     0
SORT Operations:              432

Entire Search Base Queries:   5882
Paged Searches:               266610
Unindexed Searches:           0
Unindexed Components:         51476
  Unindexed Component #1 (notes=U)
  -  Date/Time:             14/Nov/2018:18:29:53
  -  Connection Number:     3866493
  -  Operation Number:      12
  -  Etime:                 0.0000208727
  -  Nentries:              0
  -  IP Address:            Unknown_Host
  -  Search Base:           cn=accounts,dc=example,dc=local
  -  Search Scope:          2 (subtree)
  -  Search Filter:         (&(ipserviceport=5141)(ipserviceprotocol=tcp)(objectclass=ipservice))

  Unindexed Component #2 (notes=U)
  -  Date/Time:             14/Nov/2018:18:31:11
  -  Connection Number:     3866581
  -  Operation Number:      27
  -  Etime:                 1.1235918834
  -  Nentries:              0
  -  IP Address:            Unknown_Host
  -  Search Base:           cn=accounts,dc=example,dc=local
  -  Search Scope:          2 (subtree)
  -  Search Filter:         (&(ipserviceport=57672)(ipserviceprotocol=tcp)(objectclass=ipservice))

  Unindexed Component #3 (notes=U)
  -  Date/Time:             14/Nov/2018:18:30:37
  -  Connection Number:     3866744
  -  Operation Number:      516
  -  Etime:                 0.0208673499
  -  Nentries:              6
  -  IP Address:            Unknown_Host
  -  Search Base:           cn=hbac,dc=example,dc=local
  -  Search Scope:          2 (subtree)
  -  Search Filter:         (&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=herp-abc-derp-4.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=herp-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=2744511c-1234-11e6-9318-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=herp-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=3a75e6ba-1234-11e6-93f5-0656f5e1b0d2,cn=hbac,dc=example,dc=local)))

  Unindexed Component #4 (notes=U)
  -  Date/Time:             14/Nov/2018:18:31:11
  -  Connection Number:     3866763
  -  Operation Number:      513
  -  Etime:                 6.0482156752
  -  Nentries:              6
  -  IP Address:            Unknown_Host
  -  Search Base:           cn=hbac,dc=example,dc=local
  -  Search Scope:          2 (subtree)
  -  Search Filter:         (&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=blah-abc-derp-1.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=blah-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=blah,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=blah,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=f9f964bc-e70c-1234-8474-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=blah-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=0d496b52-e70d-11e6-8c4d-0656f5e1b0d2,cn=hbac,dc=example,dc=local)))

  Unindexed Component #5 (notes=U)
  -  Date/Time:             14/Nov/2018:18:32:35
  -  Connection Number:     3866973
  -  Operation Number:      336
  -  Etime:                 0.0000372680
  -  Nentries:              0
  -  IP Address:            Unknown_Host
  -  Search Base:           cn=accounts,dc=example,dc=local
  -  Search Scope:          2 (subtree)
  -  Search Filter:         (&(ipserviceport=57068)(ipserviceprotocol=tcp)(objectclass=ipservice))
FDs Taken:                    5296
FDs Returned:                 5271
Highest FD Taken:             1285

Broken Pipes:                 0
Connections Reset By Peer:    0
Resource Unavailable:         0
Max BER Size Exceeded:        0

Binds:                        5242
Unbinds:                      5247
----------------------------------
 - LDAP v2 Binds:             22
 - LDAP v3 Binds:             5220
 - AUTOBINDs(LDAPI):          0
 - SSL Client Binds:          0
 - Failed SSL Client Binds:   0
 - SASL Binds:                5040
   - GSSAPI: 5034
   - GSS-SPNEGO: 6
 - Directory Manager Binds:   2
 - Anonymous Binds:           174


----- Connection Latency Details -----

 (in seconds)           <=1     2       3       4-5     6-10    11-15   >15
 --------------------------------------------------------------------------
 (# of connections)     164     3       1       5092

----- Top 20 Most Frequent etimes -----

7               etime=0.0000224061
7               etime=0.0000290628
7               etime=0.0000294603
7               etime=0.0000218388
7               etime=0.0000228430
6               etime=0.0000225754
6               etime=0.0000245770
6               etime=0.0000284386
6               etime=0.0000221460
6               etime=0.0000220304
6               etime=0.0000208112
6               etime=0.0000228412
6               etime=0.0000223591
6               etime=0.0000102264
6               etime=0.0000257960
6               etime=0.0000224595
6               etime=0.0000232533
6               etime=0.0000232484
6               etime=0.0000228629
6               etime=0.0000254638


----- Top 20 Longest etimes -----

etime=17.0174788268    1
etime=7.1987212142    1
etime=7.1983560048    1
etime=7.1962944860    1
etime=7.1962029311    1
etime=7.1959907860    1
etime=7.1933721133    1
etime=7.1918061226    1
etime=7.1913044993    1
etime=7.1908861293    1
etime=7.1906954259    1
etime=7.1904359425    1
etime=7.1886884774    1
etime=7.1873965534    1
etime=7.1873539693    1
etime=7.1853073733    1
etime=7.1850723668    1
etime=7.1844387007    1
etime=7.1829720821    1
etime=7.1771563550    1


----- Top 20 Largest nentries -----

nentries=2000                 420
nentries=934                    6
nentries=15                 36796
nentries=14                    10
nentries=10                    39
nentries=6                  33994
nentries=5                   2797
nentries=4                      2
nentries=3                   1400
nentries=2                  36793
nentries=1                 434767
nentries=0                 265761


----- Top 20 Most returned nentries -----

434767          nentries=1
265761          nentries=0
36796           nentries=15
36793           nentries=2
33994           nentries=6
2797            nentries=5
1400            nentries=3
420             nentries=2000
39              nentries=10
10              nentries=14
6               nentries=934
2               nentries=4



----- Extended Operations -----

4082        2.16.840.1.113730.3.5.12     DS90 Start Replication Request
2991        2.16.840.1.113730.3.5.5     End Replication Request (incremental update)
26          1.3.6.1.4.1.1466.20037      Start TLS
10          1.3.6.1.4.1.4203.1.11.3     Other


----- Top 20 Most Requested Attributes -----

582185      objectClass
532315      cn
494927      ipaUniqueID
340469      memberOf
301557      entryusn
278866      ipaNTSecurityIdentifier
278541      gidNumber
272112      modifyTimestamp
272112      userPassword
247421      krbPrincipalName
247420      krbLastPwdChange
247420      krbPasswordExpiration
241536      ipaUserAuthType
240991      nsAccountLock
229843      ipaSshPubKey
198242      uid
190444      authorizedService
190444      homeDirectory
190444      host
190444      loginallowedtimemap
----- Recommendations -----

 1.  You have unindexed components, this can be caused from a search on an unindexed attribute, or your returned results exceeded the allidsthreshold.  Unindexed components are not recommended. To refuse unindexed searches, switch 'nsslap

 2.  You have a majority of etimes that are greater than zero, you may want to investigate this performance problem.

Cleaning up temp files...
Done.

More than happy to provide any other information that might be useful, this is been a super interesting experience to work through (while the impact was not ideal). Major thanks to @mreynolds for all of the help as well!


Thank you for this excellent and thorough analysis! The numbers and graphs make an appealing point for the additional indexes. I have seen the same missing indexes in https://pagure.io/freeipa/issue/6722 but never realized, that the missing indexes had such a big impact on a large installation.

@jhrozek please take a look.

Metadata Update from @cheimes:
- Issue priority set to: important
- Issue set to the milestone: FreeIPA 4.6.5

5 years ago

I'm not sure what exactly should I add except yes, these searches appear to be coming from SSSD. In particular:

  1. (&(ipserviceport=5141)(ipserviceprotocol=tcp)(objectclass=ipservice)) This looks like getservbyport(2). If you're not storing services in IPA/LDAP, then a quick mitigation might be to remove the sss module from the services line in /etc/nsswitch.conf.

  2. (&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=herp-abc-derp-4.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=herp-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=2744511c-1234-11e6-9318-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=herp-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=3a75e6ba-1234-11e6-93f5-0656f5e1b0d2,cn=hbac,dc=example,dc=local)))
    This is HBAC access control. I'm not so well-versed in the index schema syntax, but as long as you added the hostcategory, ipaenabled and accessruletype attribute indexes, that sounds sane to me

@jhrozek You just added all information that I was looking for and even more.

The index definitions look sane to me. 389-DS has multiple types of indexes. There are

  • eq, full match equality
  • pres, presence, also used for (attr=*))
  • sub, sub string match
  • aprox approximate

and two more index types for i18n and browsing. Since SSSD uses full matches, the eq index type is sufficient for these attributes.

For your information, I just created two more issues related to missing indexes:

Metadata Update from @cheimes:
- Issue tagged with: performance

5 years ago

The indices on ipaEnableFlag and hostCategory are for entries that are based on the abstract object class ipaAssociation. The structural classes include ipaCaAcl, ipaSudoRule, ipaSELinuxUserMap, ipaNISNetgroup, and ipaHBACRule. These entries have two more attributes besides hostCategory that may need to get indexed: serviceCategory and userCategory. The other ipaAssociation attributes memberHost, memberService, and memberUser are already indexed.

I could not find any filtering on serviceCategory and userCategory in SSSD or IPA. The additional indices may not be required.

Metadata Update from @cheimes:
- Custom field on_review adjusted to https://github.com/freeipa/freeipa/pull/2649
- Issue assigned to cheimes

5 years ago

The userCategory and serviceCategory attributes are used when processing sudo rules from IPA, internally SSSD converts them into a common format for processing. But there is no filter that includes these attributes, they are just requested and later processed.

I moved the ipService related performance problem into ticket https://pagure.io/freeipa/issue/7797 .

master:

  • 39eaf2f Add index and container for RFC 2307 IP services

master:

  • 0fb87bf LDAPUpdate: Batch index tasks
  • ed436e4 Add more LDAP indices
  • a34d92d Create reindex task for ipaca DB

ipa-4-6:

  • df606ff Add index and container for RFC 2307 IP services

ipa-4-7:

  • 76052d5 Add index and container for RFC 2307 IP services

ipa-4-6:

  • 55f18a8 LDAPUpdate: Batch index tasks
  • f29ab77 Add more LDAP indices
  • 26d90cd Create reindex task for ipaca DB

ipa-4-7:

  • ad37e0c LDAPUpdate: Batch index tasks
  • 903cfe3 Add more LDAP indices
  • e484c4b Create reindex task for ipaca DB

The new indices on accessRuleType, hostCategory, ipaConfigString, ipaEnabledFlag, and ipaKrbAuthzData will be available in 4.7.3 and next 4.6 update.

Thanks again for your excellent report!

Metadata Update from @cheimes:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata
Attachments 2