#302 If Directory Server for LDAP BE is offline, when SSSD is started, and then brought back up - LDAP BE never detects that the server is back on-line
Closed: Fixed None Opened 14 years ago by jgalipea.

Description
If SSSD with an LDAP BE is configured and started when the directory server is down, a search is attempted, the directory server is brought back up, getent never returns users or groups until sssd is restarted.[[BR]]

Steps to Reproduce[[BR]]

  1. stop your directory server[[BR]]
  2. install sssd and configure for LDAP Domain for the unavailable directory server[[BR]]
  3. getent -s sss passwd <ldapuser> (expect nothing returned).[[BR]]
  4. start you directory server[[BR]]
  5. getent -s sss passwd <ldapuser>[[BR]]
  6. wait 5 minutes and search again[[BR]]
  7. wait 10 minutes and search again[[BR]]

RESULTS:[[BR]]
Users and groups are never found.[[BR]]

DEBUG[[BR]]

{{{
[sssd] [service_check_alive] (4): Checking service LDAP(29311) is still alive
[sssd] [service_send_ping] (4): Pinging LDAP
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service LDAP replied to ping
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd[nss]] [accept_fd_handler] (4): Client connected!
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1] from [<ALL>]
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1@LDAP]
[sssd[nss]] [sss_dp_send_acct_req_create] (4): Sending request for [LDAP][4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Got request for [4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
[sssd[be[LDAP]]] [fo_resolve_service_send] (1): No available servers for service 'LDAP'
[sssd[nss]] [sss_dp_get_reply] (4): Got reply (1, 11, Fast reply - offline) from Data Provider
[sssd[nss]] [nss_cmd_getpwnam_dp_callback] (2): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
Will try to return what we have in cache
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No matching domain found for [puser1], fail!
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No results for getpwnam call
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping

EXPECTED:[[BR]]

users and groups found after directory server started.


Fields changed

description: '''Description'''
If SSSD with an LDAP BE is configured and started when the directory server is down, a search is attempted, the directory server is brought back up, getent never returns users or groups until sssd is restarted.[[BR]]

'''Steps to Reproduce'''
1. stop your directory server[[BR]]
2. install sssd and configure for LDAP Domain for the unavailable directory server[[BR]]
3. getent -s sss passwd <ldapuser> (expect nothing returned).[[BR]]
4. start you directory server[[BR]]
5. getent -s sss passwd <ldapuser>[[BR]]
6. wait 5 minutes and search again[[BR]]
7. wait 10 minutes and search again[[BR]]

RESULTS:[[BR]]
Users and groups are never found.[[BR]]

'''DEBUG'''[[BR]]

{{{
[sssd] [service_check_alive] (4): Checking service LDAP(29311) is still alive
[sssd] [service_send_ping] (4): Pinging LDAP
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service LDAP replied to ping
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd[nss]] [accept_fd_handler] (4): Client connected!
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1] from [<ALL>]
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1@LDAP]
[sssd[nss]] [sss_dp_send_acct_req_create] (4): Sending request for [LDAP][4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Got request for [4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
[sssd[be[LDAP]]] [fo_resolve_service_send] (1): No available servers for service 'LDAP'
[sssd[nss]] [sss_dp_get_reply] (4): Got reply (1, 11, Fast reply - offline) from Data Provider
[sssd[nss]] [nss_cmd_getpwnam_dp_callback] (2): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
Will try to return what we have in cache
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No matching domain found for [puser1], fail!
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No results for getpwnam call
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
}}}

EXPECTED:[[BR]]

users and groups found after directory server started.

=> '''Description'''
If SSSD with an LDAP BE is configured and started when the directory server is down, a search is attempted, the directory server is brought back up, getent never returns users or groups until sssd is restarted.[[BR]]

'''Steps to Reproduce'''[[BR]]

  1. stop your directory server[[BR]]
  2. install sssd and configure for LDAP Domain for the unavailable directory server[[BR]]
  3. getent -s sss passwd <ldapuser> (expect nothing returned).[[BR]]
  4. start you directory server[[BR]]
  5. getent -s sss passwd <ldapuser>[[BR]]
  6. wait 5 minutes and search again[[BR]]
  7. wait 10 minutes and search again[[BR]]

RESULTS:[[BR]]
Users and groups are never found.[[BR]]

'''DEBUG'''[[BR]]

{{{
[sssd] [service_check_alive] (4): Checking service LDAP(29311) is still alive
[sssd] [service_send_ping] (4): Pinging LDAP
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service LDAP replied to ping
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
[sssd[nss]] [accept_fd_handler] (4): Client connected!
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1] from [<ALL>]
[sssd[nss]] [nss_cmd_getpwnam] (4): Requesting info for [puser1@LDAP]
[sssd[nss]] [sss_dp_send_acct_req_create] (4): Sending request for [LDAP][4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Got request for [4097][core][name=puser1]
[sssd[be[LDAP]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
[sssd[be[LDAP]]] [fo_resolve_service_send] (1): No available servers for service 'LDAP'
[sssd[nss]] [sss_dp_get_reply] (4): Got reply (1, 11, Fast reply - offline) from Data Provider
[sssd[nss]] [nss_cmd_getpwnam_dp_callback] (2): Unable to get information from Data Provider
Error: 1, 11, Fast reply - offline
Will try to return what we have in cache
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No matching domain found for [puser1], fail!
[sssd[nss]] [nss_cmd_getpwnam_callback] (2): No results for getpwnam call
[sssd] [service_check_alive] (4): Checking service nss(29312) is still alive
[sssd] [service_send_ping] (4): Pinging nss
[sssd] [service_check_alive] (4): Checking service pam(29313) is still alive
[sssd] [service_send_ping] (4): Pinging pam
[sssd] [ping_check] (4): Service nss replied to ping
[sssd] [ping_check] (4): Service pam replied to ping
}}}

EXPECTED:[[BR]]

users and groups found after directory server started.

adding configuration:

[sssd]
config_file_version = 2
domains = LDAP
sbus_timeout = 30
services = nss, pam

[nss]
filter_groups = root
filter_users = root

[pam]

[domain/LDAP]
auth_provider = ldap
cache_credentials = FALSE
enumerate = TRUE
id_provider = ldap
ldap_group_search_base = ou=Groups,dc=example,dc=com
ldap_tls_reqcert = never
ldap_uri = ldap://sssd-rhds.idm.lab.bos.redhat.com:389
ldap_user_search_base = ou=People,dc=example,dc=com
ldap_default_bind_dn = uid=puser1,ou=People,dc=example,dc=com
ldap_default_authtok_type = password
ldap_default_authtok = Secret123
max_id = 1010
min_id = 1000
timeout = 30

Fields changed

testsupdated: 0 => 1

I believe this is related to failover processing.

component: SSSD => Failover
milestone: NEEDS_TRIAGE => SSSD 1.0
owner: somebody => simo
priority: major => critical

It looks like the failover code is marking the server unreachable, but it is not releasing it to try again later.

Martin, can you check the logic here, please?

owner: simo => mnagy

Raising this to blocker. It seems to be pervasive.

priority: critical => blocker

Fields changed

summary: If Directory Server for LDAP BE is offline, when SSSD is started, and then brought back up - getent does detect that LDAP is alive => If Directory Server for LDAP BE is offline, when SSSD is started, and then brought back up - LDAP BE never detects that the server is back on-line

Fixed by 9fbf84a

fixedin: => 1.0.0
resolution: => fixed
status: new => closed

Fields changed

rhbz: => 0

Metadata Update from @jgalipea:
- Issue assigned to mnagy
- Issue set to the milestone: SSSD 1.0

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1344

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata