From d534112f3135f0627f5354714536142ea1143cfb Mon Sep 17 00:00:00 2001 From: Pavel Březina Date: Aug 13 2017 19:36:15 +0000 Subject: Add: How to troubleshoot backend Merges: https://pagure.io/SSSD/docs/pull-request/5 --- diff --git a/users/troubleshooting.rst b/users/troubleshooting.rst index 85db182..5c64b39 100644 --- a/users/troubleshooting.rst +++ b/users/troubleshooting.rst @@ -16,6 +16,18 @@ only be performed when the information about a user can be retrieved, so if authentication doesn't work in your case, please make sure you can at least obtain info from about the user with ``getent passwd $user`` and ``id``. +Troubleshooting guides +---------------------- + +We are trying to document on examples how to read debug messages and how to +troubleshoot specific issues. These are currently available guides +that can help you: + +.. toctree:: + :maxdepth: 1 + + troubleshooting/how-to-troubleshoot-backend + Let tools help you! ------------------- diff --git a/users/troubleshooting/how-to-troubleshoot-backend.rst b/users/troubleshooting/how-to-troubleshoot-backend.rst new file mode 100644 index 0000000..589c119 --- /dev/null +++ b/users/troubleshooting/how-to-troubleshoot-backend.rst @@ -0,0 +1,579 @@ +Troubleshooting backend +======================= + +A **backend**, often also called **data provider**, is an SSSD child +process that manages and **creates the cache**. This process talks to +LDAP server, performs different lookup queries and stores the results in +the cache. It also performs online **authentication** against LDAP or +Kerberos and **applies access and password policy** to the user that is +about to log in. + +Understanding the backend component +----------------------------------- + +Backend process and its files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There is one back-end process ``/usr/libexec/sssd/sssd_be`` for each +domain configured within ``sssd.conf``. Each backend process manages one +SSSD domain defined in the configuration file and also its trusted +domains (referred to as *subdomains*) that are auto-discovered at start +up. + +Let's take a look at this example configuration. + +.. code:: ini + + [sssd] + services = nss, pam, sudo + domains = IPA.PB, AD.PB + config_file_version = 2 + + [domain/IPA.PB] + id_provider = ipa + access_provider = ipa + ipa_domain = ipa.pb + + [domain/AD.PB] + id_provider = ad + ad_domain = ad.pb + ad_server = _srv_ + +We have two domains configured, ``IPA.PB`` and ``AD.PB``. Each domain +has its own cache file, which is shared between the parent domain and +its subdomains and its own log file. The cache file is stored in +``/var/lib/sss/db/$domain.ldb`` and the log file is located at +``/var/log/sssd/$domain.log``. + +The following example shows us the SSSD backend processes and cache and +log files location. + +.. code:: bash + + # ps aux | grep sssd_be + /usr/libexec/sssd/sssd_be --domain IPA.PB --uid 0 --gid 0 --debug-to-files + /usr/libexec/sssd/sssd_be --domain AD.PB --uid 0 --gid 0 --debug-to-files + + # ls -l /var/log/sssd/ + -rw-------. 1 root root 73870 Mar 20 10:50 sssd_IPA.PB.log + -rw-------. 1 root root 73870 Mar 20 10:50 sssd_AD.PB.log + + # ls -l /var/lib/sss/db/ + -rw-------. 1 root root 1286144 Mar 20 10:50 cache_IPA.PB.ldb + -rw-------. 1 root root 1286144 Mar 20 10:50 cache_AD.PB.ldb + +Data providers work flow +~~~~~~~~~~~~~~~~~~~~~~~~ + +Backend provides several services: *id, auth, access, etc.* (refer to +sssd manual page for the full list of services). Each service is +associated with one data provider through a configuration option, for +example the *id* service is set to *IPA* provider with +``id_provider = ipa``. Usually only *id* and *access* providers are set, +having the others default to the same provider as *id*. Data provider +tells SSSD how to talk with specific server implementation (LDAP, IPA, +Active Directory, Kerberos) and how its data schema and features are +translated into SSSD cache. + +When an SSSD responder calls a backend method a series of operations is +initiated. First a new Data Provider request is created. This request +decides what provider module and which method from this module will +handle the responder's request. Then it executes this method and awaits +until it is finished. After that a reply is constructed and sent back to +the responder. + +.. graphviz:: + + digraph { + center=true; + + a->b->c->d->e->f->g + a [shape=ellipse,label="Backend Method Called"] + b [shape=box,label="Create Data Provider request"] + c [shape=box,label="Load provider method"] + d [shape=box,label="Run provider method"] + e [shape=box,label="Provider method finished"] + f [shape=box,label="Finish Data Provider request"] + g [shape=ellipse,label="Return result to the caller"] + } + +Enable logging +-------------- + +First, to actually see something in the log file we need to enable +logging by setting a debug level. There are two ways to achieve this. + +#. Using ``debug_level`` directive in domain section in ``sssd.conf``. +#. Using ``sss_debuglevel`` tool which will set the level only for + current SSSD instance and will set the original level when SSSD is + restarted. + +.. code:: ini + + # cat /etc/sssd.conf + ... + [domain/IPA.PB] + ... + debug_level = 0x3ff0 + ... + + # systemctl start sssd.service + # sss_debuglevel 0x3ff0 + +Debug level ``0x3ff0`` contains enough information for all basic tasks. +Higher level providers details on input and output operations that +creates only noise that makes the log hard to read in most of the cases +so I do not recommend setting it unless it is needed. + +Reading the logs +---------------- + +This section contains a description of the most fundamental parts of the +backend and how are they described by log messages. + +Validating options +------------------ + +On startup, the backend prints into the domain log options read by SSSD +together with their configured values including options not explicitly +specified in the ``sssd.conf``. This is useful to see how the SSSD is +configured. + +.. code:: + + [dp_get_options] (0x0400): Option ldap_enumeration_search_timeout has value 60 + [dp_get_options] (0x0400): Option ldap_auth_disable_tls_never_use_in_production is FALSE + [dp_get_options] (0x0400): Option ldap_page_size has value 1000 + [dp_get_options] (0x0400): Option ldap_deref_threshold has value 10 + [dp_get_options] (0x0400): Option ldap_sasl_canonicalize is FALSE + [dp_get_options] (0x0400): Option ldap_connection_expire_timeout has value 900 + [dp_get_options] (0x0400): Option ldap_disable_paging is FALSE + [dp_get_options] (0x0400): Option ldap_idmap_range_min has value 200000 + [dp_get_options] (0x0400): Option ldap_idmap_range_max has value 2000200000 + [dp_get_options] (0x0400): Option ldap_idmap_range_size has value 200000 + +Data Provider request life cycle +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Usually, there is one thing that goes wrong. User is not found or is not +permitted access, authentication fails, group membership does not +contain all members, etc. It is good to first look at data provider +configuration to know what providers are being used and then to the data +provider request itself to see how it finished. This will give us +indicate where to look next. + +- What providers are used for different backend services? + +.. code:: + + [dp_load_configuration] (0x0100): Using [ipa] provider for [id] + [dp_load_configuration] (0x0100): Using [ipa] provider for [auth] + [dp_load_configuration] (0x0100): Using [ipa] provider for [access] + [dp_load_configuration] (0x0100): Using [ipa] provider for [chpass] + [dp_load_configuration] (0x0100): Using [ipa] provider for [sudo] + [dp_load_configuration] (0x0100): Using [ipa] provider for [autofs] + [dp_load_configuration] (0x0100): Using [ipa] provider for [selinux] + [dp_load_configuration] (0x0100): Using [ipa] provider for [hostid] + [dp_load_configuration] (0x0100): Using [ipa] provider for [subdomains] + +- Each data provider request start and end is marked in logs. It is + associated with other related messages with a number, using the + format ``DP Request [$request-type #$request-number]``. We can see + that there is one active data provider request of type ``Account`` in + the following example and it finish with success. There are no more + active requests when this one is finished. + +.. code:: + + [dp_attach_req] (0x0400): DP Request [Account #1]: New request. Flags [0x0001]. + [dp_attach_req] (0x0400): Number of active DP request: 1 + ... provider specific method logs ... + [dp_req_done] (0x0400): DP Request [Account #1]: Request handler finished [0]: Success + [_dp_req_recv] (0x0400): DP Request [Account #1]: Receiving request data. + [dp_req_reply_list_success] (0x0400): DP Request [Account #1]: Finished. Success. + [dp_req_reply_std] (0x1000): DP Request [Account #1]: Returning [Success]: 0,0,Success + dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1:V:ad.pb:name=user-1@ad.pb] from reply table + [dp_req_destructor] (0x0400): DP Request [Account #1]: Request removed. + [dp_req_destructor] (0x0400): Number of active DP request: 0 + +- If the request is not successful, this is the place where we would + see it: + +.. code:: + + [dp_req_reply_std] (0x1000): DP Request [Subdomains #0]: Returning [Provider is Offline]: 1,1432158212,Offline + +Failover information +~~~~~~~~~~~~~~~~~~~~ + +Failover is a crucial part of SSSD. If SSSD goes offline because it +cannot establish a connection to a server, this is the place to look for +the cause. It may be a DNS issue where we cannot resolve hostname or SRV +records. It may be a connection issue when the remote server is +unrechable because it is behind firewall, etc. + +- Primary and backup server discovered with SRV DNS resolution or from + configuration file + +.. code:: + + [fo_discover_srv_done] (0x0400): Got 1 servers + [fo_add_server_to_list] (0x0400): Inserted primary server 'master.ipa.pb:389' to service 'IPA' + +- The whole failover process: + +.. code:: + + # 1. We are trying to resolve IPA service + [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA' + + # 2. No cache resolution is present, we need to find server, pick one and resolve hostname + [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'neutral' + [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds + + #3. SRV DNS resolution requested + [resolve_srv_send] (0x0200): The status of SRV lookup is neutral + [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'ipa.pb' + [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.ipa.pb' + [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds + [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher + [resolv_getsrv_done] (0x1000): Using TTL [86400][request_watch_destructor] (0x0400): Deleting request watch + [fo_discover_srv_done] (0x0400): Got answer. Processing... + + #4. We have found one primary server + [fo_discover_srv_done] (0x0400): Got 1 servers + [fo_add_server_to_list] (0x0400): Inserted primary server 'master.ipa.pb:389' to service 'IPA' + [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved' + + #5. Now we need to resolve hostname + [get_server_status] (0x1000): Status of server 'master.ipa.pb' is 'name not resolved' + [resolv_gethostbyname_step] (0x2000): Querying files + [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'master.ipa.pb' in files + [set_server_common_status] (0x0100): Marking server 'master.ipa.pb' as 'resolving name' + [resolv_gethostbyname_step] (0x2000): Querying files + [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'master.ipa.pb' in files + [resolv_gethostbyname_next] (0x0200): No more address families to retry + [resolv_gethostbyname_step] (0x2000): Querying DNS + [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'master.ipa.pb' in DNS + [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds + [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher + [sbus_remove_timeout] (0x2000): 0x2230a70 + [id_callback] (0x0100): Got id ack and version (1) from Monitor + [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply + [request_watch_destructor] (0x0400): Deleting request watch + [set_server_common_status] (0x0100): Marking server 'master.ipa.pb' as 'name resolved' + + #6. Success, now we will connect to this server + [be_resolve_server_process] (0x1000): Saving the first resolved server + [be_resolve_server_process] (0x0200): Found address for server master.ipa.pb: [10.34.78.100] TTL 1200 + [ipa_resolve_callback] (0x0400): Constructed uri 'ldap://master.ipa.pb' + [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting + [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://master.ipa.pb:389/??base] with fd [21]. + +- If the failover process fails for some reason, it will be visible in these logs. For example: + +.. code:: + + [fo_resolve_service_done] (0x0020): Failed to resolve server 'invalid.ipa.pb': Domain name not found + [set_server_common_status] (0x0100): Marking server 'invalid.ipa.pb' as 'not working' + [be_resolve_server_process] (0x0080): Couldn't resolve server (invalid.ipa.pb), resolver returned [11]: Resource temporarily unavailable + [be_resolve_server_process] (0x1000): Trying with the next one! + [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA' + [get_server_status] (0x1000): Status of server 'invalid.ipa.pb' is 'not working' + [get_server_status] (0x1000): Status of server 'invalid.ipa.pb' is 'not working' + [fo_resolve_service_send] (0x0020): No available servers for service 'IPA' + [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error + [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error]) + [be_mark_offline] (0x2000): Going offline! + +Discovered subdomains +~~~~~~~~~~~~~~~~~~~~~ + +If the uses comes from a subdomain, it is good to know what subdomains +(trusted domains in IPA and AD terminology) were discovered and how they +relate to each other. + +- Search for the following debug message to see what subdomains were + discovered. + +.. code:: + + [new_subdomain] (0x0400): Creating [ad.pb] as subdomain of [IPA.PB]! + [new_subdomain] (0x0400): Creating [child.ad.pb] as subdomain of [IPA.PB]! + +LDAP operations +~~~~~~~~~~~~~~~ + +We can see in the logs what LDAP operations were performed and how did +they finished and how many results were returned. If there is something +unexpected we can take the filter and search base and run a manual LDAP +search to see if the result match. If an LDAP operation fails and we are +online it indicates a problem on server (possible ACL issue). + +.. code:: + + # 1. What server are we connected to + [sdap_print_server] (0x2000): Searching 10.34.78.100:389 + + # 2. LDAP query with filter and search base + [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipasudorule)(ipaEnabledFlag=TRUE)(|(!(memberHost=*))(hostCategory=ALL)(memberHost=fqdn=client.sssd.pb,cn=computers,cn=accounts,dc=ipa,dc=pb)(memberHost=cn=test-hg,cn=hostgroups,cn=accounts,dc=ipa,dc=pb)))][cn=sudo,dc=ipa,dc=pb]. + + # 3. Requested attributes + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaUniqueID] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaEnabledFlag] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoOpt] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAs] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsGroup] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberAllowCmd] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberDenyCmd] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberHost] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberUser] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoNotAfter] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoNotBefore] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoOrder] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cmdCategory] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [hostCategory] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userCategory] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsUserCategory] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsGroupCategory] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtUser] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtGroup] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtUserGroup] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [externalUser] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [entryUSN] + [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 11 + [sdap_op_add] (0x2000): New operation 11 timeout 6 + + # 4. We process attributes of each object + [sdap_parse_entry] (0x1000): OriginalDN: [ipaUniqueID=ea0116a4-d262-11e6-8cd7-001a4a2312a7,cn=sudorules,cn=sudo,dc=ipa,dc=pb]. + [sdap_parse_range] (0x2000): No sub-attributes for [objectClass] + [sdap_parse_range] (0x2000): No sub-attributes for [cn] + [sdap_parse_range] (0x2000): No sub-attributes for [ipaUniqueID] + [sdap_parse_range] (0x2000): No sub-attributes for [ipaEnabledFlag] + [sdap_parse_range] (0x2000): No sub-attributes for [memberAllowCmd] + [sdap_parse_range] (0x2000): No sub-attributes for [memberUser] + [sdap_parse_range] (0x2000): No sub-attributes for [hostCategory] + [sdap_parse_range] (0x2000): No sub-attributes for [ipaSudoRunAsUserCategory] + [sdap_parse_range] (0x2000): No sub-attributes for [ipaSudoRunAsGroupCategory] + [sdap_parse_range] (0x2000): No sub-attributes for [entryUSN] + +Authentication +~~~~~~~~~~~~~~ + +If authentication fail it is usually due to invalid or expired +credentials. It may also involve possible ``clock skew`` if using +Kerberos when client's time differs from the KDC time. We log each phase +of PAM authentication as a separate data provider requests so we can get +the exact phase that fails. + +- We print input data for each phase. The following example shows + successful authentication phase. + +.. code:: + + [dp_pam_handler] (0x0100): Got request with the following data + [pam_print_data] (0x0100): command: SSS_PAM_AUTHENTICATE + [pam_print_data] (0x0100): domain: IPA.PB + [pam_print_data] (0x0100): user: admin@ipa.pb + [pam_print_data] (0x0100): service: su + [pam_print_data] (0x0100): tty: pts/1 + [pam_print_data] (0x0100): ruser: pbrezina + [pam_print_data] (0x0100): rhost: + [pam_print_data] (0x0100): authtok type: 1 + [pam_print_data] (0x0100): newauthtok type: 0 + [pam_print_data] (0x0100): priv: 0 + [pam_print_data] (0x0100): cli_pid: 29865 + [pam_print_data] (0x0100): logon name: not set + [dp_attach_req] (0x0400): DP Request [PAM Authenticate #4]: New request. Flags [0000]. + ... + [dp_req_done] (0x0400): DP Request [PAM Authenticate #4]: Request handler finished [0]: Success + [_dp_req_recv] (0x0400): DP Request [PAM Authenticate #4]: Receiving request data. + [dp_req_destructor] (0x0400): DP Request [PAM Authenticate #4]: Request removed. + [dp_req_destructor] (0x0400): Number of active DP request: 0 + [dp_pam_reply] (0x1000): DP Request [PAM Authenticate #4]: Sending result [0][IPA.PB] + +- If communication with the remote server goes well, we always return + success, however the PAM result is logged in the ``dp_pam_reply`` + function: ``Sending result [0][IPA.PB]``. Value ``0`` means success, + other values indicate problem, such as ``17`` which indicates invalid + credentials. You can get the list of PAM status code and its meaning + `here `__. + +.. code:: + + [dp_pam_reply] (0x1000): DP Request [PAM Authenticate #4]: Sending result [17][IPA.PB] + +- If you are unable to find any debug messages regarding to authentication, + it is probably due to missing ``pam_sss`` in PAM configuration (for example + in ``/etc/pam.d/system-auth``). + +Access control +~~~~~~~~~~~~~~ + +If there is any access provider other than ``permit`` configured we may +also find out why the user access is denied (or the other way around, +permitted). This may indicate wrong host base access control +configuration on IPA or that the user account has expired, etc. + +Access control is part of **Account Management PAM Phase** and **PAM +Account Request** (for ``ipa`` provider also *PAM SELinux Request*, thus +to see where the access check begin look for the following messages: + +.. code:: + + [dp_pam_handler] (0x0100): Got request with the following data + [pam_print_data] (0x0100): command: SSS_PAM_ACCT_MGMT + ... + [dp_attach_req] (0x0400): DP Request [PAM Account #4]: New request. Flags [0000]. + ... + [dp_attach_req] (0x0400): DP Request [PAM SELinux #5]: New request. Flags [0000]. + +Which rules apply on the account check depends heavily on configuration +of ``access_provider``. This provider is set to ``permit`` by default, +which means that all users can access. Please refer to SSSD manual page +to see what providers can be set. We use ``ipa`` provider in this +example that involves two checks in addition to LDAP policy: 1) Host +Base Access Control (HBAC) and 2) SELinux policy. + +- Example of access granted. Notice that we return ``0`` in + ``dp_pam_reply``. + +.. code:: + + [dp_attach_req] (0x0400): DP Request [PAM Account #4]: New request. Flags [0000]. + ... + [hbac_evaluate] (0x0100): ALLOWED by rule [allow_all]. + [hbac_evaluate] (0x0100): hbac_evaluate() >] + [ipa_hbac_evaluate_rules] (0x0080): Access granted by HBAC rule [allow_all] + [dp_req_done] (0x0400): DP Request [PAM Account #4]: Request handler finished [0]: Success + [_dp_req_recv] (0x0400): DP Request [PAM Account #4]: Receiving request data. + [dp_req_destructor] (0x0400): DP Request [PAM Account #4]: Request removed. + [dp_req_destructor] (0x0400): Number of active DP request: 0 + [dp_attach_req] (0x0400): DP Request [PAM SELinux #5]: New request. Flags [0000]. + [dp_attach_req] (0x0400): Number of active DP request: 1 + ... + [dp_req_done] (0x0400): DP Request [PAM SELinux #5]: Request handler finished [0]: Success + [_dp_req_recv] (0x0400): DP Request [PAM SELinux #5]: Receiving request data. + [dp_req_destructor] (0x0400): DP Request [PAM SELinux #5]: Request removed. + [dp_req_destructor] (0x0400): Number of active DP request: 0 + [dp_pam_reply] (0x1000): DP Request [PAM Account #4]: Sending result [0][IPA.PB] + +- Example of access denied due to missing HBAC rules (notice number + ``6`` in ``dp_pam_reply``: + +.. code:: + + [ipa_hbac_rule_info_done] (0x0080): No rules apply to this host + [ipa_pam_access_handler_done] (0x0020): No HBAC rules find, denying access + [dp_req_done] (0x0400): DP Request [PAM Account #5]: Request handler finished [0]: Success + [_dp_req_recv] (0x0400): DP Request [PAM Account #5]: Receiving request data. + [dp_req_destructor] (0x0400): DP Request [PAM Account #5]: Request removed. + [dp_req_destructor] (0x0400): Number of active DP request: 0 + [dp_pam_reply] (0x1000): DP Request [PAM Account #5]: Sending result [6][IPA.PB] + +- If you are unable to find any debug messages regarding to access control, + it is probably due to missing ``pam_sss`` in PAM configuration (for example in + ``/etc/pam.d/system-auth``). + +Timeouts +~~~~~~~~ + +Many SSSD operations have default timeouts set, often the timeouts are +configurable and can be found in the provider backend or sssd.conf man +pages. Sample log output below shows SSSD failing a search operation +due to timeout. + +.. code:: + + [sdap_op_timeout] (0x1000): Issuing timeout for 11 + [sdap_op_destructor] (0x1000): Abandoning operation 11 + [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [110]: Connection timed out + +The actual SSSD operation which timed out may not be in the same section +of the log and requires searching earlier in the logs. We can search the +logs for the operation number(11 in the above logs) such as ``msgid = 11`` + +.. code:: + + [sdap_print_server] (0x2000): Searching 10.34.78.100:389 + [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(cn=sudo-test)(|(objectClass=ipaUserGroup)(objectClass=posixGroup))(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][cn=accounts,dc=idm,dc=example,dc=com]. + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [posixGroup] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userPassword] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gidNumber] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [member] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaUniqueID] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaNTSecurityIdentifier] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [modifyTimestamp] + [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [entryUSN] + [sssd[be[example.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaExternalMember] + [sssd[be[example.com]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 11 + [sssd[be[example.com]]] [sdap_op_add] (0x2000): New operation 11 timeout 30 + +Note the last line contains the operation number and timeout value. The +30 second timeout matches up with the log message where the timeout +was triggered. + +Performing manual LDAP query +---------------------------- + +It is often useful to run the same query as SSSD manually with +``ldapsearch`` tool. Look for function named +``sdap_get_generic_ext_step`` to aquire the search base and filter +that SSSD used against an LDAP server and ``sdap_print_server`` to +see what server did it connect to. The messages are in form of: + +.. code:: + + [sdap_print_server] (0x2000): Searching $ip:$port + [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [$filter][$search-base] + +- Anonymous bind (no authentication) + +.. code:: + + ldapsearch -x -H ldap://$ip:$port -b $search-base '$filter' + +- Simple bind (authentication with credentials) + +.. code:: + + ldapsearch -x -D "cn=Directory Manager" -w "$password" -H ldap://$ip:$port -b $search-base '$filter' + +- GSSAPI (authentication through Kerberos) + +.. code:: bash + + klist -k + kinit -k '$principal' + ldapsearch -Y GSSAPI -H ldap://$ip:$port -b $search-base '$filter' + +Asking for help +--------------- + +If you did not have any luck with debugging the issue yourself you +can reach us through +`sssd-users `__ +mailing list or `#sssd channel on +freenode.net `__ IRC. + +It would be great if you can also provide all the information that you +have found so far to speed things up. Such as: + +- Description of the issue +- What do you expect that should happen +- What did you see in the logs +- All SSSD logs with debug level set to ``0x3ff0`` (please always + **include whole log files**, not only snippets) +- Version of SSSD an what server do you use +- Other information depending on the area of investigation + + - expected group membership + - domains relationships + - HBAC rules + - Other access control settings + - And everything that comes to your mind that you think may be + helpful