As an FreeIPA server administration, I expect that the default indexes shipped w/ FreeIPA are optimized for sssd's usage of FreeIPA as a backend.
sssd
Over the last few weeks, I have been working through a horrible performance issue with our production FreeIPA v4.5.4 cluster. The clients are a mix of CentOS 6 and 7 hosts all running sssd with the latest CentOS updates release for the relative platform. Outside of SSSD, we have a small handful of applications using FreeIPA for LDAP authentication.
Working with @mreynolds on Freenode, running logconv.pl --verbose /var/log/dirsrv/slapd-$REALM/access showed that we had a very high number of unindexed searches.
logconv.pl --verbose /var/log/dirsrv/slapd-$REALM/access
I'm not entirely sure on the best way to reproduce this issue, production, which sees it the worst, is currently 13,000+ ipa-clients connected to a 16 FreeIPA 4.5.4 server cluster. 1. Install FreeIPA v4.5.4 2. Connect some clients via SSSD 3. Use logconv.pl to see the unindexed searches
logconv.pl
Clients are performing hundreds of unindexed searches every minute, resulting in significant system performance degradation.
Out of the box, I would expect that the common LDAP searches performed by SSSD would be indexed and that going forward, FreeIPA/SSSD would ensure that any changes to the search pattern would remain indexed if feasible.
FreeIPA servers:
# rpm -q freeipa-server freeipa-client ipa-server ipa-client 389-ds-base pki-ca krb5-server package freeipa-server is not installed package freeipa-client is not installed ipa-server-4.5.4-10.el7.centos.4.4.x86_64 ipa-client-4.5.4-10.el7.centos.4.4.x86_64 389-ds-base-1.3.7.5-28.el7_5.x86_64 pki-ca-10.5.1-15.el7_5.noarch krb5-server-1.15.1-19.el7.x86_64
CentOS 7 Client:
# rpm -q ipa-client sssd ipa-client-4.5.4-10.el7.centos.4.4.x86_64 sssd-1.16.0-19.el7_5.8.x86_64
CentOS 6 Client:
# rpm -q ipa-client sssd ipa-client-3.0.0-51.el6.centos.x86_64 sssd-1.13.3-60.el6.x86_64
We've been able to mititagate the significant performance degradation by executing the following on all of our FreeIPA servers: 1. Create the new indexes
ldapmodify -x -D "cn=Directory Manager" -W <<EOF dn: cn=ipaenabledflag,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config changetype: add objectClass:top objectClass:nsIndex nssystemindex: false nsIndexType: eq cn: ipaenabledflag dn: cn=accessruletype,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config changetype: add objectClass:top objectClass:nsIndex nssystemindex: false nsIndexType: eq cn: accessruletype dn: cn=hostcategory,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config changetype: add objectClass:top objectClass:nsIndex nssystemindex: false nsIndexType: eq cn: hostcategory dn: cn=ipserviceport,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config changetype: add objectClass:top objectClass:nsIndex nssystemindex: false nsIndexType: eq cn: ipserviceport dn: cn=ipserviceprotocol,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config changetype: add objectClass:top objectClass:nsIndex nssystemindex: false nsIndexType: eq cn: ipserviceprotocol EOF
userRoot
db2index.pl -D 'cn=Directory Manager' -w - -n userRoot
These have now been running in production, staging, and development for over a week and the results are incredible. On average, after creating the indexes, we saw: A 15% drop in % system CPU per instance, a 46% drop in 5 minute system load per instance, a 44% drop in normalized CPU usage of the ns-slapd process all while holding the same number of client connections to the database. To help visualize this, I've attached two graphs during the time when I rolled out the indexes. Note that no other system changes were performed, only the indexes were added and userRoot was re-indexed.
ns-slapd
<img alt="max_system-load-5.png" src="/freeipa/issue/raw/files/7bd75ed34745dc0c30964a120461e9d90ece8f5a7275f6a34c74f6de7ba6cad5-max_system-load-5.png" /> <img alt="ns-slapd_cpu_normailized.png" src="/freeipa/issue/raw/files/0311b3d6646bddfdd70cad45d0f45037696244b54e8ea0b87bf6c3114b2ebbeb-ns-slapd_cpu_normailized.png" />
Finally, I've also included some of the output from logconv.pl on our access logs before the index creation. There's a TON of output, so this is just a subset of it:
----------- Access Log Output ------------ Start of Logs: 14/Nov/2018:18:29:11.243162816 End of Logs: 14/Nov/2018:19:39:05.712472728 Processed Log Time: 1 Hours, 9 Minutes, 54.46931456 Seconds Restarts: 0 Secure Protocol Versions: - TLS1.2 256-bit AES-GCM (51 connections) Peak Concurrent Connections: 36 Total Operations: 833432 Total Results: 843068 Overall Performance: 100.0% Total Connections: 5296 (1.26/sec) (75.76/min) - LDAP Connections: 5270 (1.26/sec) (75.38/min) - LDAPI Connections: 1 (0.00/sec) (0.01/min) - LDAPS Connections: 25 (0.01/sec) (0.36/min) - StartTLS Extended Ops: 26 (0.01/sec) (0.37/min) Searches: 812785 (193.78/sec) (11626.52/min) Modifications: 7734 (1.84/sec) (110.63/min) Adds: 73 (0.02/sec) (1.04/min) Deletes: 8 (0.00/sec) (0.11/min) Mod RDNs: 0 (0.00/sec) (0.00/min) Compares: 0 (0.00/sec) (0.00/min) Binds: 5242 (1.25/sec) (74.98/min) Proxied Auth Operations: 0 Persistent Searches: 0 Internal Operations: 0 Entry Operations: 0 Extended Operations: 7109 Abandoned Requests: 49 Smart Referrals Received: 0 VLV Operations: 432 VLV Unindexed Searches: 0 VLV Unindexed Components: 0 SORT Operations: 432 Entire Search Base Queries: 5882 Paged Searches: 266610 Unindexed Searches: 0 Unindexed Components: 51476
Unindexed Component #1 (notes=U) - Date/Time: 14/Nov/2018:18:29:53 - Connection Number: 3866493 - Operation Number: 12 - Etime: 0.0000208727 - Nentries: 0 - IP Address: Unknown_Host - Search Base: cn=accounts,dc=example,dc=local - Search Scope: 2 (subtree) - Search Filter: (&(ipserviceport=5141)(ipserviceprotocol=tcp)(objectclass=ipservice)) Unindexed Component #2 (notes=U) - Date/Time: 14/Nov/2018:18:31:11 - Connection Number: 3866581 - Operation Number: 27 - Etime: 1.1235918834 - Nentries: 0 - IP Address: Unknown_Host - Search Base: cn=accounts,dc=example,dc=local - Search Scope: 2 (subtree) - Search Filter: (&(ipserviceport=57672)(ipserviceprotocol=tcp)(objectclass=ipservice)) Unindexed Component #3 (notes=U) - Date/Time: 14/Nov/2018:18:30:37 - Connection Number: 3866744 - Operation Number: 516 - Etime: 0.0208673499 - Nentries: 6 - IP Address: Unknown_Host - Search Base: cn=hbac,dc=example,dc=local - Search Scope: 2 (subtree) - Search Filter: (&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=herp-abc-derp-4.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=herp-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=2744511c-1234-11e6-9318-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=herp-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=3a75e6ba-1234-11e6-93f5-0656f5e1b0d2,cn=hbac,dc=example,dc=local))) Unindexed Component #4 (notes=U) - Date/Time: 14/Nov/2018:18:31:11 - Connection Number: 3866763 - Operation Number: 513 - Etime: 6.0482156752 - Nentries: 6 - IP Address: Unknown_Host - Search Base: cn=hbac,dc=example,dc=local - Search Scope: 2 (subtree) - Search Filter: (&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=blah-abc-derp-1.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=blah-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=blah,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=blah,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=f9f964bc-e70c-1234-8474-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=blah-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=0d496b52-e70d-11e6-8c4d-0656f5e1b0d2,cn=hbac,dc=example,dc=local))) Unindexed Component #5 (notes=U) - Date/Time: 14/Nov/2018:18:32:35 - Connection Number: 3866973 - Operation Number: 336 - Etime: 0.0000372680 - Nentries: 0 - IP Address: Unknown_Host - Search Base: cn=accounts,dc=example,dc=local - Search Scope: 2 (subtree) - Search Filter: (&(ipserviceport=57068)(ipserviceprotocol=tcp)(objectclass=ipservice))
FDs Taken: 5296 FDs Returned: 5271 Highest FD Taken: 1285 Broken Pipes: 0 Connections Reset By Peer: 0 Resource Unavailable: 0 Max BER Size Exceeded: 0 Binds: 5242 Unbinds: 5247 ---------------------------------- - LDAP v2 Binds: 22 - LDAP v3 Binds: 5220 - AUTOBINDs(LDAPI): 0 - SSL Client Binds: 0 - Failed SSL Client Binds: 0 - SASL Binds: 5040 - GSSAPI: 5034 - GSS-SPNEGO: 6 - Directory Manager Binds: 2 - Anonymous Binds: 174 ----- Connection Latency Details ----- (in seconds) <=1 2 3 4-5 6-10 11-15 >15 -------------------------------------------------------------------------- (# of connections) 164 3 1 5092 ----- Top 20 Most Frequent etimes ----- 7 etime=0.0000224061 7 etime=0.0000290628 7 etime=0.0000294603 7 etime=0.0000218388 7 etime=0.0000228430 6 etime=0.0000225754 6 etime=0.0000245770 6 etime=0.0000284386 6 etime=0.0000221460 6 etime=0.0000220304 6 etime=0.0000208112 6 etime=0.0000228412 6 etime=0.0000223591 6 etime=0.0000102264 6 etime=0.0000257960 6 etime=0.0000224595 6 etime=0.0000232533 6 etime=0.0000232484 6 etime=0.0000228629 6 etime=0.0000254638 ----- Top 20 Longest etimes ----- etime=17.0174788268 1 etime=7.1987212142 1 etime=7.1983560048 1 etime=7.1962944860 1 etime=7.1962029311 1 etime=7.1959907860 1 etime=7.1933721133 1 etime=7.1918061226 1 etime=7.1913044993 1 etime=7.1908861293 1 etime=7.1906954259 1 etime=7.1904359425 1 etime=7.1886884774 1 etime=7.1873965534 1 etime=7.1873539693 1 etime=7.1853073733 1 etime=7.1850723668 1 etime=7.1844387007 1 etime=7.1829720821 1 etime=7.1771563550 1 ----- Top 20 Largest nentries ----- nentries=2000 420 nentries=934 6 nentries=15 36796 nentries=14 10 nentries=10 39 nentries=6 33994 nentries=5 2797 nentries=4 2 nentries=3 1400 nentries=2 36793 nentries=1 434767 nentries=0 265761 ----- Top 20 Most returned nentries ----- 434767 nentries=1 265761 nentries=0 36796 nentries=15 36793 nentries=2 33994 nentries=6 2797 nentries=5 1400 nentries=3 420 nentries=2000 39 nentries=10 10 nentries=14 6 nentries=934 2 nentries=4 ----- Extended Operations ----- 4082 2.16.840.1.113730.3.5.12 DS90 Start Replication Request 2991 2.16.840.1.113730.3.5.5 End Replication Request (incremental update) 26 1.3.6.1.4.1.1466.20037 Start TLS 10 1.3.6.1.4.1.4203.1.11.3 Other ----- Top 20 Most Requested Attributes ----- 582185 objectClass 532315 cn 494927 ipaUniqueID 340469 memberOf 301557 entryusn 278866 ipaNTSecurityIdentifier 278541 gidNumber 272112 modifyTimestamp 272112 userPassword 247421 krbPrincipalName 247420 krbLastPwdChange 247420 krbPasswordExpiration 241536 ipaUserAuthType 240991 nsAccountLock 229843 ipaSshPubKey 198242 uid 190444 authorizedService 190444 homeDirectory 190444 host 190444 loginallowedtimemap
----- Recommendations ----- 1. You have unindexed components, this can be caused from a search on an unindexed attribute, or your returned results exceeded the allidsthreshold. Unindexed components are not recommended. To refuse unindexed searches, switch 'nsslap 2. You have a majority of etimes that are greater than zero, you may want to investigate this performance problem. Cleaning up temp files... Done.
More than happy to provide any other information that might be useful, this is been a super interesting experience to work through (while the impact was not ideal). Major thanks to @mreynolds for all of the help as well!
Thank you for this excellent and thorough analysis! The numbers and graphs make an appealing point for the additional indexes. I have seen the same missing indexes in https://pagure.io/freeipa/issue/6722 but never realized, that the missing indexes had such a big impact on a large installation.
@jhrozek please take a look.
Metadata Update from @cheimes: - Issue priority set to: important - Issue set to the milestone: FreeIPA 4.6.5
I'm not sure what exactly should I add except yes, these searches appear to be coming from SSSD. In particular:
(&(ipserviceport=5141)(ipserviceprotocol=tcp)(objectclass=ipservice)) This looks like getservbyport(2). If you're not storing services in IPA/LDAP, then a quick mitigation might be to remove the sss module from the services line in /etc/nsswitch.conf.
(&(ipserviceport=5141)(ipserviceprotocol=tcp)(objectclass=ipservice))
getservbyport(2)
sss
services
/etc/nsswitch.conf
(&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=herp-abc-derp-4.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=herp-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=2744511c-1234-11e6-9318-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=herp-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=3a75e6ba-1234-11e6-93f5-0656f5e1b0d2,cn=hbac,dc=example,dc=local))) This is HBAC access control. I'm not so well-versed in the index schema syntax, but as long as you added the hostcategory, ipaenabled and accessruletype attribute indexes, that sounds sane to me
(&(objectclass=ipahbacrule)(ipaenabledflag=true)(accessruletype=allow)(|(hostcategory=all)(memberhost=fqdn=herp-abc-derp-4.ops.example.local,cn=computers,cn=accounts,dc=example,dc=local)(memberhost=cn=herp-prod,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=hostgroups,cn=accounts,dc=example,dc=local)(memberhost=cn=herp,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=2744511c-1234-11e6-9318-0656f5e1b0d2,cn=hbac,dc=example,dc=local)(memberhost=cn=herp-prod,cn=ng,cn=alt,dc=example,dc=local)(memberhost=ipauniqueid=3a75e6ba-1234-11e6-93f5-0656f5e1b0d2,cn=hbac,dc=example,dc=local)))
hostcategory
ipaenabled
accessruletype
@jhrozek You just added all information that I was looking for and even more.
The index definitions look sane to me. 389-DS has multiple types of indexes. There are
eq
pres
(attr=*)
sub
aprox
and two more index types for i18n and browsing. Since SSSD uses full matches, the eq index type is sufficient for these attributes.
For your information, I just created two more issues related to missing indexes:
*
Metadata Update from @cheimes: - Issue tagged with: performance
The indices on ipaEnableFlag and hostCategory are for entries that are based on the abstract object class ipaAssociation. The structural classes include ipaCaAcl, ipaSudoRule, ipaSELinuxUserMap, ipaNISNetgroup, and ipaHBACRule. These entries have two more attributes besides hostCategory that may need to get indexed: serviceCategory and userCategory. The other ipaAssociation attributes memberHost, memberService, and memberUser are already indexed.
ipaEnableFlag
hostCategory
ipaAssociation
ipaCaAcl
ipaSudoRule
ipaSELinuxUserMap
ipaNISNetgroup
ipaHBACRule
serviceCategory
userCategory
memberHost
memberService
memberUser
I could not find any filtering on serviceCategory and userCategory in SSSD or IPA. The additional indices may not be required.
Metadata Update from @cheimes: - Custom field on_review adjusted to https://github.com/freeipa/freeipa/pull/2649 - Issue assigned to cheimes
The userCategory and serviceCategory attributes are used when processing sudo rules from IPA, internally SSSD converts them into a common format for processing. But there is no filter that includes these attributes, they are just requested and later processed.
I moved the ipService related performance problem into ticket https://pagure.io/freeipa/issue/7797 .
ipService
master:
ipa-4-6:
ipa-4-7:
The new indices on accessRuleType, hostCategory, ipaConfigString, ipaEnabledFlag, and ipaKrbAuthzData will be available in 4.7.3 and next 4.6 update.
Thanks again for your excellent report!
Metadata Update from @cheimes: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.