#4062 id command takes minutes to return a result for external users
Closed: cloned-to-github 3 years ago by pbrezina. Opened 4 years ago by rgp.

Issue

When I attempt to run id someuser (AD user) on CentOS host enrolled to FreeIPA it can take anywhere from 3 minutes to 17 minutes to get a result. In the meantime the host seems to flood the master servers with LDAP queries. Removing sssd cache files rm -f /var/lib/sss/db/* and restarting sssd resolves the issue temporary. Running id again takes couple of seconds. The behaviour returns after a while, maybe 2 hours. I have not tried to time it more precisely. In the meantime the client has not been touched.

Running id for the same user in the freeipa masters works with little to no delay.

Details about the setup:

The FreeIPA domain running on 2 servers with a trust to AD domain. The AD domain is a decent size and has over 1000 users, over 1000 computers, and many groups. The servers are not any load, there is plenty of resource. There is no network connectivity issues.

Steps to Reproduce

  1. FreeIPA domain with trust to large AD domain
  2. Enroll client with ipa-client-install
  3. Wait couple of hours
  4. run id aduser

Actual behavior

Running id aduser takes many minutes to return a result

Expected behavior

Running id aduser return result almost instantly

Version/Release/Distribution

client:

# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
# rpm -q ipa-client
ipa-client-4.6.4-10.el7.centos.3.x86_64

servers:

# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
# rpm -q  ipa-server ipa-client 389-ds-base pki-ca krb5-server
ipa-server-4.6.4-10.el7.centos.2.x86_64
ipa-client-4.6.4-10.el7.centos.2.x86_64
389-ds-base-1.3.8.4-22.el7_6.x86_64
pki-ca-10.5.9-6.el7.noarch
krb5-server-1.15.1-37.el7_6.x86_64

Additional info:

# time id aduser
uid=.......

real    3m21.593s
user    0m0.001s
sys     0m0.009s

Wait 30 minutes and run the command again:

# time id aduser
uid=.......

real    17m53.673s
user    0m0.006s
sys     0m0.014s

/var/log/sssd/sssd_intra.company.com.log

# sssctl debug-level 4

(Wed Aug 14 01:50:49 2019) [sssd[be[intra.company.com]]] [sdap_id_conn_data_expire_handler] (0x0080): connection is about to expire, releasing it
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [collapse_srv_lookup] (0x0100): Need to refresh SRV lookup for domain intra.company.com
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.intra.company.com'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [get_server_status] (0x0100): Hostname resolution expired, resetting the server status of 'freeipa-02.intra.company.com'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'name not resolved'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'freeipa-02.intra.company.com' in files
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'resolving name'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'freeipa-02.intra.company.com' in files
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'freeipa-02.intra.company.com' in DNS
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'name resolved'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23427] finished successfully.
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_addtl_cmdgroups_done] (0x0040): Received 2 additional command groups
(Wed Aug 14 01:50:55 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_rules_done] (0x0040): Received 0 sudo rules
(Wed Aug 14 02:05:49 2019) [sssd[be[intra.company.com]]] [sdap_id_conn_data_expire_handler] (0x0080): connection is about to expire, releasing it
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [collapse_srv_lookup] (0x0100): Need to refresh SRV lookup for domain intra.company.com
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.intra.company.com'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'freeipa-01.intra.company.com' in files
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'resolving name'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'freeipa-01.intra.company.com' in files
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'freeipa-01.intra.company.com' in DNS
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'name resolved'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23527] finished successfully.
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_addtl_cmdgroups_done] (0x0040): Received 2 additional command groups
(Wed Aug 14 02:05:55 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_rules_done] (0x0040): Received 1 sudo rules
(Wed Aug 14 02:07:30 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request

# host has been idle for a while, and I run a look for an AD user: id jsmith

(Wed Aug 14 02:07:36 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [Attribute or value exists](20)[attribute 'member': value #17 on 'name=jsmith@company.com,cn=users,cn=company.com,cn=sysdb' already exists]
(Wed Aug 14 02:07:36 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [jsmith@company.com] to group [name=jsmith@company.com,cn=users,cn=company.com,cn=sysdb]. Skipping.
(Wed Aug 14 02:07:36 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Wed Aug 14 02:07:36 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23531] finished successfully.
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:07:42 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23533] finished successfully.
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:07:48 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:07:52 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Wed Aug 14 02:07:52 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [bbury@company.com] to group [name=somegroup1@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.
(Wed Aug 14 02:07:52 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Wed Aug 14 02:07:52 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [bbury@company.com] to group [name=somegroup2@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.

# snip: these line repeat thousand of times (with different user/groups)
# occasionally SSSD seems to reconnect

(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0100): Resetting the status of port 389 for server 'freeipa-01.intra.company.com'
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23534] finished successfully.
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:08:30 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0100): Resetting the status of port 389 for server 'freeipa-02.intra.company.com'
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23535] finished successfully.
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:08:36 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'working'
(Wed Aug 14 02:08:42 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Wed Aug 14 02:08:42 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [tom@company.com] to group [name=anothergroup1@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.
(Wed Aug 14 02:08:42 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Wed Aug 14 02:08:42 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [tom@company.com] to group [name=exchange group@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.

# snip: these line repeat thousand of times (with different user/groups)

# grep -c 'ldb_modify failed' sssd_id_delay.log
13865
# grep -c ' s2n exop request failed' sssd_id_delay.log
52

# eventually id returns

(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [11]: Resource temporarily unavailable.
(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Wed Aug 14 02:19:19 2019) [sssd[be[intra.company.com]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [11]: Resource temporarily unavailable.
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'name not resolved'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-01.intra.company.com' as 'neutral'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'name not resolved'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-02.intra.company.com' as 'neutral'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-02.intra.company.com' as 'name not resolved'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'freeipa-02.intra.company.com' as 'neutral'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [collapse_srv_lookup] (0x0100): Need to refresh SRV lookup for domain intra.company.com
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.intra.company.com'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'freeipa-01.intra.company.com' in files
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'resolving name'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'freeipa-01.intra.company.com' in files
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'freeipa-01.intra.company.com' in DNS
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'name resolved'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23574] finished successfully.
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:20:18 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [be_run_online_cb] (0x0080): Going online. Running callbacks.
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [sdap_cli_auth_step] (0x0100): expire timeout is 900
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: host/c7client.intra.company.com
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [child_sig_handler] (0x0100): child [23575] finished successfully.
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'freeipa-01.intra.company.com' as 'working'
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_rules_done] (0x0040): Received 1 sudo rules
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [ipa_enable_enterprise_principals] (0x0100): Enterprise principals enabled.
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'c7client.intra.company.com' in DNS
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve AAAA record of 'c7client.intra.company.com' in DNS
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_next] (0x0100): No more hosts databases to retry
(Wed Aug 14 02:20:19 2019) [sssd[be[intra.company.com]]] [ipa_dyndns_nsupdate_done] (0x0040): DNS update finished

/etc/sssd/sssd.conf

[domain/intra.company.com]

cache_credentials = True
krb5_store_password_if_offline = True
ipa_domain = intra.company.com
id_provider = ipa
auth_provider = ipa
access_provider = ipa
ipa_hostname = c7client.intra.company.com
chpass_provider = ipa
dyndns_update = True
ipa_server = _srv_, freeipa-01.intra.company.com
dyndns_iface = ens192
ldap_tls_cacert = /etc/ipa/ca.crt
debug_level = 4

[sssd]
services = nss, sudo, pam, ssh
domains = intra.company.com

[nss]
homedir_substring = /home

[pam]

[sudo]

[autofs]

[ssh]

[pac]

[ifp]

[secrets]

[session_recording]

Hi,

according to the log snippets you've added SSSD on the client waits for 6s on a reply from the IPA server, then it timeouts and treats the request as failed. While it would be expected that SSSD tries another server if defined in sssd.conf it is not expected that it tries to connect to the same server that many times. So it looks like there is an issue which prevents SSSD from switching into offline mode in the case.

But in general there shouldn't be such a delay in the response of the server since you said that calling id for the given user in the IPA server returns pretty fast the server should be able to answer the request from the client fast as well, even if all entries in the cache in the server must be refreshed as well.

To better understand what is going on on the server it would be good if you can add corresponding server logs. For a start I wonder if you can send the directory server access log /var/log/dirsrv/slapd-YOUR-DOMAIN/access from the same time interval as the SSSD logs from above or from another time interval where this issue occurs on a client?

bye,
Sumit

Hi @sbose, I looked through the logs and they look normal. The client seem to query every AD/external account, both user and computer accounts, and all groups. Well over 10,000 queries. Here is a snip, there are other, but all seem to query the trust view. I cannot include 20-30k lines log.

[16/Aug/2019:09:54:10.945208200 +0000] conn=766 op=3094 RESULT err=0 tag=101 nentries=0 etime=0.0000135309
[16/Aug/2019:09:54:10.945373681 +0000] conn=766 op=3095 SRCH base="cn=Default Trust View,cn=views,cn=accounts,dc=intra,dc=gammatelecom,dc=com" scope=2 filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-1659004503-507921405-839522115-26662))" attrs=ALL
[16/Aug/2019:09:54:10.945478702 +0000] conn=766 op=3095 RESULT err=0 tag=101 nentries=0 etime=0.0000140004
[16/Aug/2019:09:54:10.945671876 +0000] conn=766 op=3096 SRCH base="cn=Default Trust View,cn=views,cn=accounts,dc=intra,dc=gammatelecom,dc=com" scope=2 filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-1659004503-507921405-839522115-15107))" attrs=ALL
[16/Aug/2019:09:54:10.945776636 +0000] conn=766 op=3096 RESULT err=0 tag=101 nentries=0 etime=0.0000139791
[16/Aug/2019:09:54:10.945929498 +0000] conn=766 op=3097 SRCH base="cn=Default Trust View,cn=views,cn=accounts,dc=intra,dc=gammatelecom,dc=com" scope=2 filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-1659004503-507921405-839522115-23192))" attrs=ALL

There is nothing special around the "disconnect" time. Just an unbind. Looks completely normal. No errors. I think the client timeouts internally or something. I'm not familiar with the internals of SSSD.

On the client, the id command takes minutes even after subsequent runs. Isn't SSSD supposed to cache? Here is me running it 3 times, one after the other:

# time id aduser
uid=....

real    3m37.239s
user    0m0.004s
sys     0m0.009s

# time id aduser
uid=...

real    2m21.717s
user    0m0.003s
sys     0m0.007s

#time id aduser
uid=....

real    3m11.594s
user    0m0.001s
sys     0m0.007s

Then if I remove the cache files its suddenly instant:

# systemctl stop sssd
# rm -f /var/lib/sss/db/*
# systemctl start sssd
# time id adduser
uid=....

real    0m2.267s
user    0m0.004s
sys     0m0.003s

If I leave it alone for about an hour, and run id again, again over 3 minutes.

Hi,

this request which times out is not an ordinary search but an extended operation. This can be identified in the access lgos of the directory server by EXT oid="2.16.840.1.113730.3.8.10.4.1" instead of SRCH. Can you add some of those together with the RESULT which has the same conn and op values?

SSSD caches the data with a lifetime, after cached entry is expired, SSSD tries to read it again from the server. If the server is not reachable or the requests times out SSSD should switch into offline mode and return the data from the cache. As I said it looks like SSSD does not switch into offline mode here, which has to be fixed.

But what is puzzling is why SSSD cannot refresh the data. Do you really have to remove the cached data or is calling sssctl restart sssd sufficient to make the lookup fast again?

bye,
Sumit

Putting the queries together is tricky as client disconnects, I see like over 10 connections. There is a connection from the FreeIPA server to itself that does thousands of queries. All the queries look ok. err=0 on all of them, nothing unusual.

I'm not sure if it relevant, but sssd on the masters has:

subdomain_inherit = ldap_purge_cache_timeout
ldap_purge_cache_timeout = 0

Everything else is default.

On client, just restarting sssd doesn't seem to make a big difference. Removing the cache seems to do the best job. I did the following experimentation:

set ipa_server = freeipa-01.intra.company.com
and same server in resolv.conf to make collecting logs easier
----

# date -u; time id aduser
Fri Aug 16 11:52:37 UTC 2019
uid=....

real    3m30.843s
user    0m0.002s
sys     0m0.010s
# date -u; systemctl restart sssd
Fri Aug 16 11:58:45 UTC 2019
# date -u; time id aduser
Fri Aug 16 11:58:52 UTC 2019
uid=.....

real    0m50.216s
user    0m0.003s
sys     0m0.007s

---- waited over 1 hour -----

# date -u; systemctl restart sssd
Fri Aug 16 13:26:37 UTC 2019
# date -u; time id aduser
Fri Aug 16 13:26:51 UTC 2019
uid=....

real    0m6.525s
user    0m0.000s
sys     0m0.010s

/var/log/sssd/sssd_intra.company.com.log
(Fri Aug 16 14:26:37 2019) [sssd[be[intra.company.com]]] [resolv_gethostbyname_next] (0x0100): No more hosts databases to retry
(Fri Aug 16 14:26:37 2019) [sssd[be[intra.company.com]]] [ipa_dyndns_nsupdate_done] (0x0040): DNS update finished
(Fri Aug 16 14:26:47 2019) [sssd[be[intra.company.com]]] [ipa_sudo_fetch_rules_done] (0x0040): Received 1 sudo rules
(Fri Aug 16 14:26:51 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_user_done] (0x0040): s2n exop request failed.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0020): No available servers for service 'IPA'
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [11]: Resource temporarily unavailable.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #4]: Finished. Backend is currently offline.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #5]: Finished. Backend is currently offline.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Initgroups #6]: Finished. Backend is currently offline.
(Fri Aug 16 14:26:57 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #7]: Finished. Backend is currently offline.
...
# date -u; time id aduser
Fri Aug 16 13:30:00 UTC 2019
uid=.....

real    0m0.032s
user    0m0.000s
sys     0m0.006s
[root@c7client db]# date -u; time id aduser
Fri Aug 16 13:32:20 UTC 2019
uid=.....

real    0m25.192s
user    0m0.003s
sys     0m0.006s

--- wait over 1 hour again ----

# date -u; systemctl restart sssd
Fri Aug 16 14:59:02 UTC 2019
# date -u; time id aduser
Fri Aug 16 14:59:43 UTC 2019
uid=.....

real    0m24.557s
user    0m0.003s
sys     0m0.006s

/var/log/sssd/sssd_intra.company.com.log
(Fri Aug 16 15:59:49 2019) [sssd[be[intra.company.com]]] [set_server_common_status] (0x0100): Marking server 'hex-freeipa-01.intra.company.com' as 'working'
(Fri Aug 16 15:59:49 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Fri Aug 16 15:59:49 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Fri Aug 16 15:59:49 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Fri Aug 16 15:59:49 2019) [sssd[be[intra.company.com]]] [ipa_id_get_account_info_orig_done] (0x0080): Object not found, ending request
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_set_cache_entry_attr] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_set_entry_attr] (0x0080): Cannot set attrs for name=aduser@company.com,cn=groups,cn=company.com,cn=sysdb, 2 [No such file or directory]
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_store_group] (0x0040): Cache update failed: 2
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [ipa_s2n_save_objects] (0x0040): sysdb_store_group failed.
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_user_done] (0x0040): ipa_s2n_save_objects failed.
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [Attribute or value exists](20)[attribute 'member': value #17 on 'name=aduser@company.com,cn=users,cn=company.com,cn=sysdb' already exists]
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [aduser@company.com] to group [name=aduser@company.com,cn=users,cn=company.com,cn=sysdb]. Skipping.
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [Attribute or value exists](20)[attribute 'member': value #17 on 'name=aduser@company.com,cn=users,cn=company.com,cn=sysdb' already exists]
(Fri Aug 16 15:59:50 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [aduser@company.com] to group [name=aduser@company.com,cn=users,cn=company.com,cn=sysdb]. Skipping.
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [anotheraduser@company.com] to group [name=grooup1@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [anotheraduser@company.com] to group [name=grooup2@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such object](32)[ldb_wait from ldb_modify with LDB_WAIT_ALL: No such object (32)]
(Fri Aug 16 15:59:54 2019) [sssd[be[intra.company.com]]] [sysdb_update_members_ex] (0x0020): Could not add member [anotheraduser@company.com] to group [name=grooup3@company.com,cn=groups,cn=company.com,cn=sysdb]. Skipping.
repeats few dozen times other users and groups
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_next] (0x0040): s2n exop request failed.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [ipa_s2n_get_list_done] (0x0040): s2n get_fqlist request failed.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [fo_resolve_service_send] (0x0020): No available servers for service 'IPA'
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [11]: Resource temporarily unavailable.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #13]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #14]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #15]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #16]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:07 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #17]: Finished. Backend is currently offline.
repeats
(Fri Aug 16 16:00:08 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #86]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:08 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #87]: Finished. Backend is currently offline.
(Fri Aug 16 16:00:08 2019) [sssd[be[intra.company.com]]] [dp_req_reply_gen_error] (0x0080): DP Request [Account #88]: Finished. Backend is currently offline.
--- point the client to freeipa-02.intra.company.com, both sssd and resolv.conf -----

# date -u; systemctl restart sssd
Fri Aug 16 15:09:38 UTC 2019
[root@racktables-test db]# date -u; time id aduser
Fri Aug 16 15:09:46 UTC 2019
uid=.....

real    1m11.915s
user    0m0.002s
sys     0m0.009s

# systemctl stop sssd
# rm -f /var/lib/sss/db/*
# systemctl start sssd
# date -u; time id aduser
Fri Aug 16 15:12:24 UTC 2019
uid=...

real    0m7.304s
user    0m0.000s
sys     0m0.008s

Putting the queries together is tricky as client disconnects, I see like over 10 connections. There is a connection from the FreeIPA server to itself that does thousands of queries. All the queries look ok. err=0 on all of them, nothing unusual.

Can you try to collect some of the EXT oid="2.16.840.1.113730.3.8.10.4.1" requests with the matching RESULT?

I'm not sure if it relevant, but sssd on the masters has:
subdomain_inherit = ldap_purge_cache_timeout
ldap_purge_cache_timeout = 0

Everything else is default.
On client, just restarting sssd doesn't seem to make a big difference. Removing the cache seems to do the best job. I did the following experimentation:

Can you add a tar ball with the full SSSD logs after restart without removing the cache?

bye,
Sumit

Hi @sbose, here is what I ran:

# systemctl stop sssd
# > sssd_intra.gammatelecom.com.log
# systemctl start sssd
# time id aduser
uid=....

real    11m27.230s
user    0m0.004s
sys     0m0.014s

Can you try to collect some of the EXT oid="2.16.840.1.113730.3.8.10.4.1" requests with the matching RESULT?

Here are all EXT queries from the dirserv for that duration from the client: https://paste.fedoraproject.org/paste/jw6Ft385xE~~OJ4cCFlcDw/raw

Can you add a tar ball with the full SSSD logs after restart without removing the cache?

Full client log for the IPA domain: https://paste.fedoraproject.org/paste/ysZkzRkWAWuoRnuSn8sb0w/raw

You mentioned tar, do you want all sssd client logs? If yes, what debug levels do I set? Where can I upload the tar file?

Metadata Update from @thalman:
- Issue tagged with: Future milestone

4 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/5030

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @pbrezina:
- Issue close_status updated to: cloned-to-github
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata