#2297 After some unknown trigger "getent [i.e. passwd]" stops returning freeipa data
Closed: Invalid None Opened 10 years ago by brianjmurrell.

I have experienced, using sssd 1.9.2[-129.el6_5.4], that after some unknown trigger, which might be as simple as time, getent passwd will block for a minute and then fail to return FreeIPA users.

Simply restarting FreeIPA will correct the symptom, but clearly is not the solution.

A log of sssd's nss backend when this happens:

(Tue Apr  1 06:19:57 2014) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): Debug level changed to 0x07f0
(Tue Apr  1 06:19:57 2014) [sssd[nss]] [nss_clear_memcache] (0x0400): CLEAR_MC_FLAG not found. Nothing to do.
(Tue Apr  1 06:19:57 2014) [sssd[nss]] [nss_orphan_netgroups] (0x0400): Removing netgroups from memory cache.
(Tue Apr  1 06:20:08 2014) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
(Tue Apr  1 06:20:08 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr  1 06:20:08 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr  1 06:20:08 2014) [sssd[nss]] [nss_cmd_setpwent_send] (0x0100): Received setpwent request
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'root' matched without domain, user is root
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [(null)]
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups] (0x0100): Requesting info for [root] from [<ALL>]
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [LOCAL]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [example.com]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [example.com]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0080): No matching domain found for [root], fail!
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'root' matched without domain, user is root
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [(null)]
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups] (0x0100): Requesting info for [root] from [<ALL>]
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [LOCAL]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [example.com]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0040): User [root] does not exist in [example.com]! (negative cache)
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [nss_cmd_initgroups_search] (0x0080): No matching domain found for [root], fail!
(Tue Apr  1 06:20:29 2014) [sssd[nss]] [client_recv] (0x0200): Client disconnected!
(Tue Apr  1 06:21:08 2014) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
(Tue Apr  1 06:21:08 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr  1 06:21:08 2014) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr  1 06:21:08 2014) [sssd[nss]] [nss_cmd_endpwent] (0x0100): Terminating request info for all accounts
(Tue Apr  1 06:21:08 2014) [sssd[nss]] [client_recv] (0x0200): Client disconnected!

And the sssd.conf file:

[domain/example.com]

cache_credentials = True
krb5_store_password_if_offline = True
ipa_domain = example.com
id_provider = ipa
auth_provider = ipa
access_provider = ipa
ipa_hostname = foo-3.example.com
chpass_provider = ipa
ipa_server = _srv_, ipa-2.example.com
ldap_tls_cacert = /etc/ipa/ca.crt
[sssd]
config_file_version = 2
# Number of times services should attempt to reconnect in the
# event of a crash or restart before they give up
reconnection_retries = 3
# if a backend is particularly slow you can raise this timeout here
sbus_timeout = 30
services = nss, pam, ssh
domains = LOCAL, example.com, EXAMPLE.COM
# SSSD will not start if you don't configure any domain.
# Add new domains condifgurations as [domain/<NAME>] sections.
# Then add the list of domains (in the order you want them to be
# queried in the 'domains" attribute above and uncomment it


[nss]
# the following prevents sssd for searching for the root user/group in
# all domains (you can add here a comma separated list of system accounts are
# always going to be /etc/passwd users, or that you want to filter out)
filter_groups = root
filter_users = root
reconnection_retries = 3

# The EntryCacheTimeout indicates the number of seconds to retain before
# an entry in cache is considered stale and must block to refresh.
# The EntryCacheNoWaitRefreshTimeout indicates the number of seconds to
# wait before updating the cache out-of-band. (NSS requests will still
# be returned from cache until the full EntryCacheTimeout). Setting this
# value to 0 turns this feature off (default)
# entry_cache_timeout = 600
# entry_cache_nowait_timeout = 300

[pam]
reconnection_retries = 3

# Example LOCAL domain that stores all users natively in the SSSD internal
# directory. These local users and groups are not visibile in /etc/passwd, it
# now contains only root and system accounts.
[domain/LOCAL]
description = LOCAL Users domain
id_provider = local
enumerate = true
min_id = 500
max_id = 999

[domain/EXAMPLE.COM]
auth_provider = krb5
chpass_provider = krb5
krb5_kdcip = ipa-1, ipa-2
krb5_kpasswd = ipa-1
krb5_realm = EXAMPLE.COM
id_provider = ldap
ldap_uri = ldap://ipa-1, ldap://ipa-2
ldap_search_base = cn=accounts,dc=example,dc=com
ldap_schema = rfc2307bis
# ldap_tls_reqcert = demand
# ldap_tls_cacert = /etc/ssl/certs/slapd-ca-cert.pem
cache_credentials = true
enumerate = true
debug_level = 10
[ssh]

As a first debugging step I am going to remove the LOCAL provider.


I attempted to reproduce the issue locally, so far with no luck. Looking at the setpwent code and the debug messsages I received from the reporter on IRC, it looks like we fail to terminate the request properly once the first setpwent result (which we store in a hash atm) expires. I haven't been able to figure out why.

I recommended removing the LOCAL provider because the reporter doesn't need it anyway, so he'd get somewhat faster lookups and also because the LOCAL provider is a special case and doesn't have a backend connection, chances are this would solve the issue completely.

Hi, did removing the LOCAL domain help you in your deployment?

Ping, did you have a chance to test remove LOCAL backend?

Please reopen if the problem persists.

resolution: => worksforme
status: new => closed

Metadata Update from @brianjmurrell:
- Issue set to the milestone: NEEDS_TRIAGE

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3339

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata