Learn more about these different git repos.
Other Git URLs
Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1421194
Description of problem: Customer has a large number of users on RHEL7 SSSD clients that are complaining about extreme slowness in name service lookups. Investigations indicate that SSSD does not appear to be using the global catalog for gidnumber lookups. SSSD is trawling through all the domains in the forest (and other trusted domains), rather that doing a global catalog search. This is causing serious performance problems on RHEL7 clients in the production AD forest. Please can you explain under what circumstances the SSSD client is supposed to use the GC and confirm whether the behaviour seen in my testing is as expected or not. Specifically could you provide information on what attributes are searched for in the GC, whether this is effective for forward (eg user->uidnumber) and reverse (eg uidnumber->user) lookups, and what is the expected behaviour if a value for the attribute is not found in the GC. Version-Release number of selected component (if applicable): python-sssdconfig-1.14.0-43.el7.noarch sssd-common-1.14.0-43.el7.x86_64 sssd-ipa-1.14.0-43.el7.x86_64 sssd-common-pac-1.14.0-43.el7.x86_64 sssd-ldap-1.14.0-43.el7.x86_64 sssd-1.14.0-43.el7.x86_64 sssd-krb5-common-1.14.0-43.el7.x86_64 sssd-krb5-1.14.0-43.el7.x86_64 sssd-ad-1.14.0-43.el7.x86_64 sssd-client-1.14.0-43.el7.x86_64 sssd-proxy-1.14.0-43.el7.x86_64 How reproducible: Reliably. Steps to Reproduce: Test Performed: # Stop SSSD, clear cache and logs, and restart [root@oo-vm5 ~]# service sssd stop Redirecting to /bin/systemctl stop sssd.service [root@oo-vm5 ~]# rm -rf /var/lib/sss/db/* /var/log/sssd/* [root@oo-vm5 ~]# service sssd start Redirecting to /bin/systemctl start sssd.service # Do a ls of a directory containing files with group owners from local domain, trusted domains, and not existing at all. [jw83113@oo-vm5 test]$ ls -lg total 0 -rw-rw-r--. 1 unixeng-access-grp@eurdev.nsrootdev.net 0 Feb 3 04:12 eurgroup -rw-rw-r--. 1 9999999 0 Feb 3 04:12 groupnotfound -rw-rw-r--. 1 jwlacgrp2@lacdev.nsrootdev.net 0 Feb 3 09:53 lacgroup -rw-rw-r--. 1 jw83113 0 Feb 3 04:12 namgroup # Look in sssd_nss.log file to see what domains are being queried (log level 0x01F0) # It can be observed that SSSD is trawling domains until it finds the gidnumber or worse when it does not exist, it tries every domain rather than checking the GC. (SSSD should know that gidnumber is replicated to GC from the schema objects) [root@oo-vm5 ~]# tail -f /var/log/sssd/sssd_nss.log [?] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1200955690@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1200955690@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@lacdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@lacdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@ausdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [9999999@ausdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0080): No matching domain found for [9999999] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1250152112@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@NAMDEV.NSROOTDEV.NET] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@eurdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@lacdev.nsrootdev.net] (Mon Feb 6 13:14:55 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [1290120733@lacdev.nsrootdev.net] # A TCP dump shows that SSSD does communicate with the GC but then goes to each domain anyway. 13:14:55.561889 IP oo-vm5.nam.nsroot.net.56466 > namdev003.nam.nsroot.net.msft-gc: Flags [P.], seq 2020:2222, ack 3596, win 321, options [nop,nop,TS val 2718168430 ecr 113087555], length 202 13:14:55.580684 IP namdev003.nam.nsroot.net.msft-gc > oo-vm5.nam.nsroot.net.56466: Flags [P.], seq 3596:3868, ack 2222, win 258, options [nop,nop,TS val 113087586 ecr 2718168430], length 272 13:14:55.580698 IP oo-vm5.nam.nsroot.net.56466 > namdev003.nam.nsroot.net.msft-gc: Flags [.], ack 3868, win 344, options [nop,nop,TS val 2718168448 ecr 113087586], length 0 13:14:55.590267 IP oo-vm5.nam.nsroot.net.32812 > namdev003.nam.nsroot.net.ldap: Flags [P.], seq 5116:5416, ack 31133, win 836, options [nop,nop,TS val 2718168458 ecr 113087583], length 300 13:14:55.590811 IP oo-vm5.nam.nsroot.net.32812 > namdev003.nam.nsroot.net.ldap: Flags [P.], seq 5416:5716, ack 31219, win 836, options [nop,nop,TS val 2718168459 ecr 113087587], length 300 13:14:55.591781 IP oo-vm5.nam.nsroot.net.35076 > nsrootdev001.nam.nsroot.net.ldap: Flags [P.], seq 3455:3745, ack 9334, win 406, options [nop,nop,TS val 2718168460 ecr 247010140], length 290 13:14:55.592662 IP oo-vm5.nam.nsroot.net.35076 > nsrootdev001.nam.nsroot.net.ldap: Flags [P.], seq 3745:4035, ack 9970, win 416, options [nop,nop,TS val 2718168460 ecr 247010144], length 290 13:14:55.593851 IP oo-vm5.nam.nsroot.net.46548 > eurdev003.nam.nsroot.net.ldap: Flags [P.], seq 2686:2986, ack 4045, win 321, options [nop,nop,TS val 2718168462 ecr 246862079], length 300 13:14:55.594750 IP oo-vm5.nam.nsroot.net.46770 > ausdev001.nam.nsroot.net.ldap: Flags [.], ack 3466, win 299, options [nop,nop,TS val 2718168463 ecr 272910065], length 0 13:14:55.595372 IP oo-vm5.nam.nsroot.net.46548 > eurdev003.nam.nsroot.net.ldap: Flags [P.], seq 2986:3286, ack 4131, win 321, options [nop,nop,TS val 2718168463 ecr 246862082], length 300 13:14:55.596562 IP oo-vm5.nam.nsroot.net.46992 > lacdev001.nam.nsroot.net.ldap: Flags [P.], seq 2386:2686, ack 3466, win 299, options [nop,nop,TS val 2718168464 ecr 272909609], length 300 13:14:55.598262 IP oo-vm5.nam.nsroot.net.46992 > lacdev001.nam.nsroot.net.ldap: Flags [.], ack 3948, win 321, options [nop,nop,TS val 2718168466 ecr 272909622], length 0 13:14:55.630765 IP oo-vm5.nam.nsroot.net.32812 > namdev003.nam.nsroot.net.ldap: Flags [.], ack 31305, win 836, options [nop,nop,TS val 2718168499 ecr 113087587], length 0 13:14:55.632766 IP oo-vm5.nam.nsroot.net.35076 > nsrootdev001.nam.nsroot.net.ldap: Flags [.], ack 10606, win 426, options [nop,nop,TS val 2718168501 ecr 247010144], length 0 13:14:55.635786 IP oo-vm5.nam.nsroot.net.46548 > eurdev003.nam.nsroot.net.ldap: Flags [.], ack 4217, win 321, options [nop,nop,TS val 2718168504 ecr 246862082], length 0 In the production AD forest this takes 18 seconds to search for a single non-existent group. Once this goes into the negative entry cache (which I have increased the time to live), the lookup is fast because it?s fulfilled from the local cache but this initial lookup is unacceptably slow. SSSD should do one single search against the GC not trawl around all the domains. (Mon Feb 6 13:20:24 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@NAM.NSROOT.NET] (Mon Feb 6 13:20:24 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@NAM.NSROOT.NET] (Mon Feb 6 13:20:24 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@nsroot.net] (Mon Feb 6 13:20:25 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@nsroot.net] (Mon Feb 6 13:20:25 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@cardsprod.cards.citicorp.com] (Mon Feb 6 13:20:27 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@cardsprod.cards.citicorp.com] (Mon Feb 6 13:20:27 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@eur.nsroot.net] (Mon Feb 6 13:20:27 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@eur.nsroot.net] (Mon Feb 6 13:20:27 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@cbna.csdev.narb.citicorp.com] (Mon Feb 6 13:20:29 2017) [sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): Unable to get information from Data Provider Error: 3, 1432158270, Group lookup failed Will try to return what we have in cache (Mon Feb 6 13:20:29 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@jpn.nsroot.net] (Mon Feb 6 13:20:31 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@jpn.nsroot.net] (Mon Feb 6 13:20:31 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@lac.nsroot.net] (Mon Feb 6 13:20:33 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@lac.nsroot.net] (Mon Feb 6 13:20:33 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@REL.crest.dyn.nsroot.net] (Mon Feb 6 13:20:35 2017) [sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): Unable to get information from Data Provider Error: 3, 1432158270, Group lookup failed Will try to return what we have in cache (Mon Feb 6 13:20:35 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@bancocuscatlan.ubc] (Mon Feb 6 13:20:38 2017) [sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): Unable to get information from Data Provider Error: 3, 1432158270, Group lookup failed Will try to return what we have in cache (Mon Feb 6 13:20:38 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@aus.nsroot.net] (Mon Feb 6 13:20:41 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@aus.nsroot.net] (Mon Feb 6 13:20:41 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@nbs.csdev.narb.citicorp.com] (Mon Feb 6 13:20:42 2017) [sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): Unable to get information from Data Provider Error: 3, 1432158270, Group lookup failed Will try to return what we have in cache (Mon Feb 6 13:20:42 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@apac.nsroot.net] (Mon Feb 6 13:20:45 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0100): Requesting info for [500008636@apac.nsroot.net] (Mon Feb 6 13:20:45 2017) [sssd[nss]] [nss_cmd_getgrgid_search] (0x0080): No matching domain found for [500008636] Actual results: SSSD goes to global catalog, but then trawls subdomains any way. Expected results: SSSD should always use global catalog first, and only trawl subdomains if data cannot be found. Additional info: The performance problem is occuring in a production AD environment which is very large (approx 400000 users), however the behaviour has been replicated in a development AD environment too. The delays in development are much smaller because there are a lot fewer entries however the effects are the same in that many domains are being searched individually rather than a single GC query. When does the behavior occur? Frequently? Repeatedly? At certain times? Always when an entry is not in the local SSSD cache. It's repeatable. What information can you provide around timeframes and the business impact? This is an important case for the customer as it has visibility to production end users. They are managing the impact by increasing the SSSD negative entry cache time to live to reduce the frequency of these lookups but it's not a root cause fix. They need a root cause fix for this problem.
Metadata Update from @jhrozek: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1421194
Metadata Update from @jhrozek: - Issue priority set to: major
Metadata Update from @jhrozek: - Issue tagged with: performance
Metadata Update from @jhrozek: - Issue assigned to jhrozek
Since we are required to release a new upstream tarball no later than Friday Oct-20, I'm moving tickets that will not be closed by that date to the next milestone, 1.16.1
Metadata Update from @jhrozek: - Issue set to the milestone: SSSD 1.16.1 (was: SSSD 1.16.0)
PR: https://github.com/SSSD/sssd/pull/453
Metadata Update from @jhrozek: - Issue tagged with: PR
master:
Metadata Update from @lslebodn: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/4494
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.