#5 Add backend troubleshooting guide
Merged 6 years ago by pbrezina. Opened 7 years ago by pbrezina.
SSSD/ pbrezina/docs backend  into  master

file modified
+1 -1
@@ -30,7 +30,7 @@ 

  # Add any Sphinx extension module names here, as strings. They can be

  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom

  # ones.

- extensions = []

+ extensions = ['sphinx.ext.graphviz']

  

  # Add any paths that contain templates here, relative to this directory.

  templates_path = ['_templates']

file modified
+12
@@ -16,6 +16,18 @@ 

  authentication doesn't work in your case, please make sure you can at least

  obtain info from about the user with ``getent passwd $user`` and ``id``.

  

+ Troubleshooting guides

+ ----------------------

+ 

+ We are trying to document on examples how to read debug messages and how to

+ troubleshoot specific issues. These are currently available guides

+ that can help you:

+ 

+ .. toctree::

+    :maxdepth: 1

+ 

+    troubleshooting/how-to-troubleshoot-backend

+ 

  Let tools help you!

  -------------------

  

@@ -0,0 +1,579 @@ 

+ Troubleshooting backend

+ =======================

+ 

+ A **backend**, often also called **data provider**, is an SSSD child

+ process that manages and **creates the cache**. This process talks to

+ LDAP server, performs different lookup queries and stores the results in

+ the cache. It also performs online **authentication** against LDAP or

+ Kerberos and **applies access and password policy** to the user that is

+ about to log in.

+ 

+ Understanding the backend component

+ -----------------------------------

+ 

+ Backend process and its files

+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+ 

+ There is one back-end process ``/usr/libexec/sssd/sssd_be`` for each

+ domain configured within ``sssd.conf``. Each backend process manages one

+ SSSD domain defined in the configuration file and also its trusted

+ domains (referred to as *subdomains*) that are auto-discovered at start

+ up.

+ 

+ Let's take a look at this example configuration.

+ 

+ .. code:: ini

+ 

+     [sssd]

+     services = nss, pam, sudo

+     domains = IPA.PB, AD.PB

+     config_file_version = 2

+ 

+     [domain/IPA.PB]

+     id_provider = ipa

+     access_provider = ipa

+     ipa_domain = ipa.pb

+ 

+     [domain/AD.PB]

+     id_provider = ad

+     ad_domain = ad.pb

+     ad_server = _srv_

+ 

+ We have two domains configured, ``IPA.PB`` and ``AD.PB``. Each domain

+ has its own cache file, which is shared between the parent domain and

+ its subdomains and its own log file. The cache file is stored in

+ ``/var/lib/sss/db/$domain.ldb`` and the log file is located at

+ ``/var/log/sssd/$domain.log``.

+ 

+ The following example shows us the SSSD backend processes and cache and

+ log files location.

+ 

+ .. code:: bash

+ 

+     # ps aux | grep sssd_be

+     /usr/libexec/sssd/sssd_be --domain IPA.PB --uid 0 --gid 0 --debug-to-files

+     /usr/libexec/sssd/sssd_be --domain AD.PB --uid 0 --gid 0 --debug-to-files

+ 

+     # ls -l /var/log/sssd/

+     -rw-------. 1 root root 73870 Mar 20 10:50 sssd_IPA.PB.log

+     -rw-------. 1 root root 73870 Mar 20 10:50 sssd_AD.PB.log

+ 

+     # ls -l /var/lib/sss/db/

+     -rw-------. 1 root root 1286144 Mar 20 10:50 cache_IPA.PB.ldb

+     -rw-------. 1 root root 1286144 Mar 20 10:50 cache_AD.PB.ldb

+ 

+ Data providers work flow

+ ~~~~~~~~~~~~~~~~~~~~~~~~

+ 

+ Backend provides several services: *id, auth, access, etc.* (refer to

+ sssd manual page for the full list of services). Each service is

+ associated with one data provider through a configuration option, for

+ example the *id* service is set to *IPA* provider with

+ ``id_provider = ipa``. Usually only *id* and *access* providers are set,

+ having the others default to the same provider as *id*. Data provider

+ tells SSSD how to talk with specific server implementation (LDAP, IPA,

+ Active Directory, Kerberos) and how its data schema and features are

+ translated into SSSD cache.

+ 

+ When an SSSD responder calls a backend method a series of operations is

+ initiated. First a new Data Provider request is created. This request

+ decides what provider module and which method from this module will

+ handle the responder's request. Then it executes this method and awaits

+ until it is finished. After that a reply is constructed and sent back to

+ the responder.

+ 

+ .. graphviz::

+    

+    digraph {

+       center=true;

+    

+       a->b->c->d->e->f->g

+       a  [shape=ellipse,label="Backend Method Called"]

+       b  [shape=box,label="Create Data Provider request"]

+       c  [shape=box,label="Load provider method"]

+       d  [shape=box,label="Run provider method"]

+       e  [shape=box,label="Provider method finished"]

+       f  [shape=box,label="Finish Data Provider request"]

+       g  [shape=ellipse,label="Return result to the caller"]

+    }

+ 

+ Enable logging

+ --------------

+ 

+ First, to actually see something in the log file we need to enable

+ logging by setting a debug level. There are two ways to achieve this.

+ 

+ #. Using ``debug_level`` directive in domain section in ``sssd.conf``.

+ #. Using ``sss_debuglevel`` tool which will set the level only for

+    current SSSD instance and will set the original level when SSSD is

+    restarted.

+ 

+ .. code:: ini

+ 

+     # cat /etc/sssd.conf

+     ...

+     [domain/IPA.PB]

+     ...

+     debug_level = 0x3ff0

+     ...

+ 

+     # systemctl start sssd.service

+     # sss_debuglevel 0x3ff0

+ 

+ Debug level ``0x3ff0`` contains enough information for all basic tasks.

+ Higher level providers details on input and output operations that

+ creates only noise that makes the log hard to read in most of the cases

+ so I do not recommend setting it unless it is needed.

+ 

+ Reading the logs

+ ----------------

+ 

+ This section contains a description of the most fundamental parts of the

+ backend and how are they described by log messages.

+ 

+ Validating options

+ ------------------

+ 

+ On startup, the backend prints into the domain log options read by SSSD

+ together with their configured values including options not explicitly

+ specified in the ``sssd.conf``. This is useful to see how the SSSD is

+ configured.

+  

+ .. code::

+  

+    [dp_get_options] (0x0400): Option ldap_enumeration_search_timeout has value 60

+    [dp_get_options] (0x0400): Option ldap_auth_disable_tls_never_use_in_production is FALSE

+    [dp_get_options] (0x0400): Option ldap_page_size has value 1000

+    [dp_get_options] (0x0400): Option ldap_deref_threshold has value 10

+    [dp_get_options] (0x0400): Option ldap_sasl_canonicalize is FALSE

+    [dp_get_options] (0x0400): Option ldap_connection_expire_timeout has value 900

+    [dp_get_options] (0x0400): Option ldap_disable_paging is FALSE

+    [dp_get_options] (0x0400): Option ldap_idmap_range_min has value 200000

+    [dp_get_options] (0x0400): Option ldap_idmap_range_max has value 2000200000

+    [dp_get_options] (0x0400): Option ldap_idmap_range_size has value 200000

+ 

+ Data Provider request life cycle

+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+ 

+ Usually, there is one thing that goes wrong. User is not found or is not

+ permitted access, authentication fails, group membership does not

+ contain all members, etc. It is good to first look at data provider

+ configuration to know what providers are being used and then to the data

+ provider request itself to see how it finished. This will give us

+ indicate where to look next.

+ 

+ -  What providers are used for different backend services?

+ 

+ .. code::

+ 

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [id]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [auth]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [access]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [chpass]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [sudo]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [autofs]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [selinux]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [hostid]

+    [dp_load_configuration] (0x0100): Using [ipa] provider for [subdomains]

+ 

+ -  Each data provider request start and end is marked in logs. It is

+    associated with other related messages with a number, using the

+    format ``DP Request [$request-type #$request-number]``. We can see

+    that there is one active data provider request of type ``Account`` in

+    the following example and it finish with success. There are no more

+    active requests when this one is finished.

+ 

+ .. code::

+ 

+    [dp_attach_req] (0x0400): DP Request [Account #1]: New request. Flags [0x0001].

+    [dp_attach_req] (0x0400): Number of active DP request: 1

+    ... provider specific method logs ...

+    [dp_req_done] (0x0400): DP Request [Account #1]: Request handler finished [0]: Success

+    [_dp_req_recv] (0x0400): DP Request [Account #1]: Receiving request data.

+    [dp_req_reply_list_success] (0x0400): DP Request [Account #1]: Finished. Success.

+    [dp_req_reply_std] (0x1000): DP Request [Account #1]: Returning [Success]: 0,0,Success

+    dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1:V:ad.pb:name=user-1@ad.pb] from reply table

+    [dp_req_destructor] (0x0400): DP Request [Account #1]: Request removed.

+    [dp_req_destructor] (0x0400): Number of active DP request: 0

+ 

+ -  If the request is not successful, this is the place where we would

+    see it:

+ 

+ .. code::

+ 

+    [dp_req_reply_std] (0x1000): DP Request [Subdomains #0]: Returning [Provider is Offline]: 1,1432158212,Offline

+ 

+ Failover information

+ ~~~~~~~~~~~~~~~~~~~~

+ 

+ Failover is a crucial part of SSSD. If SSSD goes offline because it

+ cannot establish a connection to a server, this is the place to look for

+ the cause. It may be a DNS issue where we cannot resolve hostname or SRV

+ records. It may be a connection issue when the remote server is

+ unrechable because it is behind firewall, etc.

+ 

+ -  Primary and backup server discovered with SRV DNS resolution or from

+    configuration file

+ 

+ .. code::

+ 

+    [fo_discover_srv_done] (0x0400): Got 1 servers

+    [fo_add_server_to_list] (0x0400): Inserted primary server 'master.ipa.pb:389' to service 'IPA'

+ 

+ -  The whole failover process:

+ 

+ .. code::

+ 

+    # 1. We are trying to resolve IPA service

+    [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'

+ 

+    # 2. No cache resolution is present, we need to find server, pick one and resolve hostname

+    [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'neutral'

+    [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds

+ 

+    #3. SRV DNS resolution requested

+    [resolve_srv_send] (0x0200): The status of SRV lookup is neutral

+    [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'ipa.pb'

+    [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.ipa.pb'

+    [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds

+    [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher

+    [resolv_getsrv_done] (0x1000): Using TTL [86400][request_watch_destructor] (0x0400): Deleting request watch

+    [fo_discover_srv_done] (0x0400): Got answer. Processing...

+ 

+    #4. We have found one primary server

+    [fo_discover_srv_done] (0x0400): Got 1 servers

+    [fo_add_server_to_list] (0x0400): Inserted primary server   'master.ipa.pb:389' to service 'IPA'

+    [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved'

+ 

+    #5. Now we need to resolve hostname

+    [get_server_status] (0x1000): Status of server 'master.ipa.pb' is 'name not resolved'

+    [resolv_gethostbyname_step] (0x2000): Querying files

+    [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'master.ipa.pb' in files

+    [set_server_common_status] (0x0100): Marking server 'master.ipa.pb' as 'resolving name'

+    [resolv_gethostbyname_step] (0x2000): Querying files

+    [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'master.ipa.pb' in files

+    [resolv_gethostbyname_next] (0x0200): No more address families to retry

+    [resolv_gethostbyname_step] (0x2000): Querying DNS

+    [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'master.ipa.pb' in DNS

+    [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds

+    [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher

+    [sbus_remove_timeout] (0x2000): 0x2230a70

+    [id_callback] (0x0100): Got id ack and version (1) from Monitor

+    [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply

+    [request_watch_destructor] (0x0400): Deleting request watch

+    [set_server_common_status] (0x0100): Marking server 'master.ipa.pb' as 'name resolved'

+ 

+    #6. Success, now we will connect to this server

+    [be_resolve_server_process] (0x1000): Saving the first resolved server

+    [be_resolve_server_process] (0x0200): Found address for server master.ipa.pb: [10.34.78.100] TTL 1200

+    [ipa_resolve_callback] (0x0400): Constructed uri 'ldap://master.ipa.pb'

+    [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting

+    [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://master.ipa.pb:389/??base] with fd [21].

+ 

+ - If the failover process fails for some reason, it will be visible in these logs. For example:

+ 

+ .. code::

+ 

+    [fo_resolve_service_done] (0x0020): Failed to resolve server 'invalid.ipa.pb': Domain name not found

+    [set_server_common_status] (0x0100): Marking server 'invalid.ipa.pb' as 'not working'

+    [be_resolve_server_process] (0x0080): Couldn't resolve server (invalid.ipa.pb), resolver returned [11]: Resource temporarily unavailable

+    [be_resolve_server_process] (0x1000): Trying with the next one!

+    [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'

+    [get_server_status] (0x1000): Status of server 'invalid.ipa.pb' is 'not working'

+    [get_server_status] (0x1000): Status of server 'invalid.ipa.pb' is 'not working'

+    [fo_resolve_service_send] (0x0020): No available servers for service 'IPA'

+    [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error

+    [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])

+    [be_mark_offline] (0x2000): Going offline!

+ 

+ Discovered subdomains

+ ~~~~~~~~~~~~~~~~~~~~~

+ 

+ If the uses comes from a subdomain, it is good to know what subdomains

+ (trusted domains in IPA and AD terminology) were discovered and how they

+ relate to each other.

+ 

+ -  Search for the following debug message to see what subdomains were

+    discovered.

+ 

+ .. code::

+ 

+    [new_subdomain] (0x0400): Creating [ad.pb] as subdomain of [IPA.PB]!

+    [new_subdomain] (0x0400): Creating [child.ad.pb] as subdomain of [IPA.PB]!

+ 

+ LDAP operations

+ ~~~~~~~~~~~~~~~

+ 

+ We can see in the logs what LDAP operations were performed and how did

+ they finished and how many results were returned. If there is something

+ unexpected we can take the filter and search base and run a manual LDAP

+ search to see if the result match. If an LDAP operation fails and we are

+ online it indicates a problem on server (possible ACL issue).

+ 

+ .. code::

+ 

+     # 1. What server are we connected to

+     [sdap_print_server] (0x2000): Searching 10.34.78.100:389

+ 

+     # 2. LDAP query with filter and search base

+     [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipasudorule)(ipaEnabledFlag=TRUE)(|(!(memberHost=*))(hostCategory=ALL)(memberHost=fqdn=client.sssd.pb,cn=computers,cn=accounts,dc=ipa,dc=pb)(memberHost=cn=test-hg,cn=hostgroups,cn=accounts,dc=ipa,dc=pb)))][cn=sudo,dc=ipa,dc=pb].

+ 

+     # 3. Requested attributes

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaUniqueID]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaEnabledFlag]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoOpt]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAs]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsGroup]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberAllowCmd]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberDenyCmd]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberHost]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [memberUser]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoNotAfter]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoNotBefore]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [sudoOrder]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cmdCategory]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [hostCategory]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userCategory]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsUserCategory]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsGroupCategory]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtUser]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtGroup]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaSudoRunAsExtUserGroup]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [externalUser]

+     [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [entryUSN]

+     [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 11

+     [sdap_op_add] (0x2000): New operation 11 timeout 6

+ 

+     # 4. We process attributes of each object

+     [sdap_parse_entry] (0x1000): OriginalDN: [ipaUniqueID=ea0116a4-d262-11e6-8cd7-001a4a2312a7,cn=sudorules,cn=sudo,dc=ipa,dc=pb].

+     [sdap_parse_range] (0x2000): No sub-attributes for [objectClass]

+     [sdap_parse_range] (0x2000): No sub-attributes for [cn]

+     [sdap_parse_range] (0x2000): No sub-attributes for [ipaUniqueID]

+     [sdap_parse_range] (0x2000): No sub-attributes for [ipaEnabledFlag]

+     [sdap_parse_range] (0x2000): No sub-attributes for [memberAllowCmd]

+     [sdap_parse_range] (0x2000): No sub-attributes for [memberUser]

+     [sdap_parse_range] (0x2000): No sub-attributes for [hostCategory]

+     [sdap_parse_range] (0x2000): No sub-attributes for [ipaSudoRunAsUserCategory]

+     [sdap_parse_range] (0x2000): No sub-attributes for [ipaSudoRunAsGroupCategory]

+     [sdap_parse_range] (0x2000): No sub-attributes for [entryUSN]

+ 

+ Authentication

+ ~~~~~~~~~~~~~~

+ 

+ If authentication fail it is usually due to invalid or expired

+ credentials. It may also involve possible ``clock skew`` if using

+ Kerberos when client's time differs from the KDC time. We log each phase

+ of PAM authentication as a separate data provider requests so we can get

+ the exact phase that fails.

+ 

+ -  We print input data for each phase. The following example shows

+    successful authentication phase.

+ 

+ .. code::

+ 

+    [dp_pam_handler] (0x0100): Got request with the following data

+    [pam_print_data] (0x0100): command: SSS_PAM_AUTHENTICATE

+    [pam_print_data] (0x0100): domain: IPA.PB

+    [pam_print_data] (0x0100): user: admin@ipa.pb

+    [pam_print_data] (0x0100): service: su

+    [pam_print_data] (0x0100): tty: pts/1

+    [pam_print_data] (0x0100): ruser: pbrezina

+    [pam_print_data] (0x0100): rhost:

+    [pam_print_data] (0x0100): authtok type: 1

+    [pam_print_data] (0x0100): newauthtok type: 0

+    [pam_print_data] (0x0100): priv: 0

+    [pam_print_data] (0x0100): cli_pid: 29865

+    [pam_print_data] (0x0100): logon name: not set

+    [dp_attach_req] (0x0400): DP Request [PAM Authenticate #4]: New request. Flags [0000].

+    ...

+    [dp_req_done] (0x0400): DP Request [PAM Authenticate #4]: Request handler finished [0]: Success

+    [_dp_req_recv] (0x0400): DP Request [PAM Authenticate #4]: Receiving request data.

+    [dp_req_destructor] (0x0400): DP Request [PAM Authenticate #4]: Request removed.

+    [dp_req_destructor] (0x0400): Number of active DP request: 0

+    [dp_pam_reply] (0x1000): DP Request [PAM Authenticate #4]: Sending result [0][IPA.PB]

+ 

+ -  If communication with the remote server goes well, we always return

+    success, however the PAM result is logged in the ``dp_pam_reply``

+    function: ``Sending result [0][IPA.PB]``. Value ``0`` means success,

+    other values indicate problem, such as ``17`` which indicates invalid

+    credentials. You can get the list of PAM status code and its meaning

+    `here <http://pubs.opengroup.org/onlinepubs/8329799/chap5.htm#tagcjh_06_02_01>`__.

+ 

+ .. code::

+ 

+    [dp_pam_reply] (0x1000): DP Request [PAM Authenticate #4]: Sending result [17][IPA.PB]

+    

+ - If you are unable to find any debug messages regarding to authentication,

+   it is probably due to missing ``pam_sss`` in PAM configuration (for example

+   in ``/etc/pam.d/system-auth``).

+ 

+ Access control

+ ~~~~~~~~~~~~~~

+ 

+ If there is any access provider other than ``permit`` configured we may

+ also find out why the user access is denied (or the other way around,

+ permitted). This may indicate wrong host base access control

+ configuration on IPA or that the user account has expired, etc.

+ 

+ Access control is part of **Account Management PAM Phase** and **PAM

+ Account Request** (for ``ipa`` provider also *PAM SELinux Request*, thus

+ to see where the access check begin look for the following messages:

+ 

+ .. code::

+ 

+     [dp_pam_handler] (0x0100): Got request with the following data

+     [pam_print_data] (0x0100): command: SSS_PAM_ACCT_MGMT

+     ...

+     [dp_attach_req] (0x0400): DP Request [PAM Account #4]: New request. Flags [0000].

+     ...

+     [dp_attach_req] (0x0400): DP Request [PAM SELinux #5]: New request. Flags [0000].

+ 

+ Which rules apply on the account check depends heavily on configuration

+ of ``access_provider``. This provider is set to ``permit`` by default,

+ which means that all users can access. Please refer to SSSD manual page

+ to see what providers can be set. We use ``ipa`` provider in this

+ example that involves two checks in addition to LDAP policy: 1) Host

+ Base Access Control (HBAC) and 2) SELinux policy.

+ 

+ -  Example of access granted. Notice that we return ``0`` in

+    ``dp_pam_reply``.

+ 

+ .. code::

+ 

+    [dp_attach_req] (0x0400): DP Request [PAM Account #4]: New request. Flags [0000].

+    ...

+    [hbac_evaluate] (0x0100): ALLOWED by rule [allow_all].

+    [hbac_evaluate] (0x0100): hbac_evaluate() >]

+    [ipa_hbac_evaluate_rules] (0x0080): Access granted by HBAC rule [allow_all]

+    [dp_req_done] (0x0400): DP Request [PAM Account #4]: Request handler finished [0]: Success

+    [_dp_req_recv] (0x0400): DP Request [PAM Account #4]: Receiving request data.

+    [dp_req_destructor] (0x0400): DP Request [PAM Account #4]: Request removed.

+    [dp_req_destructor] (0x0400): Number of active DP request: 0

+    [dp_attach_req] (0x0400): DP Request [PAM SELinux #5]: New request. Flags [0000].

+    [dp_attach_req] (0x0400): Number of active DP request: 1

+    ...

+    [dp_req_done] (0x0400): DP Request [PAM SELinux #5]: Request handler finished [0]: Success

+    [_dp_req_recv] (0x0400): DP Request [PAM SELinux #5]: Receiving request data.

+    [dp_req_destructor] (0x0400): DP Request [PAM SELinux #5]: Request removed.

+    [dp_req_destructor] (0x0400): Number of active DP request: 0

+    [dp_pam_reply] (0x1000): DP Request [PAM Account #4]: Sending result [0][IPA.PB]

+ 

+ -  Example of access denied due to missing HBAC rules (notice number

+    ``6`` in ``dp_pam_reply``:

+ 

+ .. code::

+ 

+    [ipa_hbac_rule_info_done] (0x0080): No rules apply to this host

+    [ipa_pam_access_handler_done] (0x0020): No HBAC rules find, denying access

+    [dp_req_done] (0x0400): DP Request [PAM Account #5]: Request handler finished [0]: Success

+    [_dp_req_recv] (0x0400): DP Request [PAM Account #5]: Receiving request data.

+    [dp_req_destructor] (0x0400): DP Request [PAM Account #5]: Request removed.

+    [dp_req_destructor] (0x0400): Number of active DP request: 0

+    [dp_pam_reply] (0x1000): DP Request [PAM Account #5]: Sending result [6][IPA.PB]

+    

+ - If you are unable to find any debug messages regarding to access control,

+   it is probably due to missing ``pam_sss`` in PAM configuration (for example in

+   ``/etc/pam.d/system-auth``).

+  

+ Timeouts

+ ~~~~~~~~

+ 

+ Many SSSD operations have default timeouts set, often the timeouts are 

+ configurable and can be found in the provider backend or sssd.conf man

+ pages. Sample log output below shows SSSD failing a search operation 

+ due to timeout.

+   

+ .. code::

+ 

+    [sdap_op_timeout] (0x1000): Issuing timeout for 11

+    [sdap_op_destructor] (0x1000): Abandoning operation 11

+    [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [110]: Connection timed out

+  

+ The actual SSSD operation which timed out may not be in the same section

+ of the log and requires searching earlier in the logs. We can search the

+ logs for the operation number(11 in the above logs) such as ``msgid = 11``

+  

+ .. code::

+ 

+    [sdap_print_server] (0x2000): Searching 10.34.78.100:389

+    [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(cn=sudo-test)(|(objectClass=ipaUserGroup)(objectClass=posixGroup))(cn=*)(&(gidNumber=*)(!(gidNumber=0))))][cn=accounts,dc=idm,dc=example,dc=com].

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [posixGroup]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [userPassword]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [gidNumber]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [member]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaUniqueID]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaNTSecurityIdentifier]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [modifyTimestamp]

+    [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [entryUSN]

+    [sssd[be[example.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaExternalMember]

+    [sssd[be[example.com]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 11

+    [sssd[be[example.com]]] [sdap_op_add] (0x2000): New operation 11 timeout 30

+     

+ Note the last line contains the operation number and timeout value. The

+ 30 second timeout matches up with the log message where the timeout

+ was triggered.

+ 

+ Performing manual LDAP query

+ ----------------------------

+ 

+ It is often useful to run the same query as SSSD manually with

+ ``ldapsearch`` tool. Look for function named

+ ``sdap_get_generic_ext_step`` to aquire the search base and filter

+ that SSSD used against an LDAP server and ``sdap_print_server`` to

+ see what server did it connect to. The messages are in form of:

+ 

+ .. code::

+ 

+    [sdap_print_server] (0x2000): Searching $ip:$port

+    [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [$filter][$search-base]

+ 

+ -  Anonymous bind (no authentication)

+ 

+ .. code::

+ 

+    ldapsearch -x -H ldap://$ip:$port -b $search-base '$filter'

+ 

+ -  Simple bind (authentication with credentials)

+ 

+ .. code::

+ 

+    ldapsearch -x -D "cn=Directory Manager" -w "$password" -H ldap://$ip:$port -b $search-base '$filter'

+ 

+ -  GSSAPI (authentication through Kerberos)

+ 

+ .. code:: bash

+ 

+    klist -k

+    kinit -k '$principal'

+    ldapsearch -Y GSSAPI -H ldap://$ip:$port -b $search-base '$filter'

+ 

+ Asking for help

+ ---------------

+ 

+ If you did not have any luck with debugging the issue yourself you

+ can reach us through

+ `sssd-users <https://fedorahosted.org/mailman/listinfo/sssd-users>`__

+ mailing list or `#sssd channel on

+ freenode.net <irc://irc.freenode.net/sssd>`__ IRC.

+ 

+ It would be great if you can also provide all the information that you

+ have found so far to speed things up. Such as:

+ 

+ -  Description of the issue

+ -  What do you expect that should happen

+ -  What did you see in the logs

+ -  All SSSD logs with debug level set to ``0x3ff0`` (please always

+    **include whole log files**, not only snippets)

+ -  Version of SSSD an what server do you use

+ -  Other information depending on the area of investigation

+ 

+    -  expected group membership

+    -  domains relationships

+    -  HBAC rules

+    -  Other access control settings

+    -  And everything that comes to your mind that you think may be

+       helpful

no initial comment

Hello Pavel,

Nice work on this guide, I think it will be very useful!

I used the Fork and Edit button to add some information and make some minor changes.

https://pagure.io/fork/jstephen/SSSD/docs/c/f24625499ccebc47a01067cfd5a8411cbdfd2e4e?branch=backend

Besides the changes above, some other thoughts were

  • Could it be worth mentioning that the authentication/access control SSSD handlers are only called if the PAM configuration is correct and pam_sss is reached traversing the PAM stack.

  • Should there be some mention of the krb5_child.log in the authentication section? I think this probably depends on how generic this backend guide is intended to be(non-AD, non-IPA specific)

  • Do you think debug_level 9 logs should be requested at least in the Asking for help section?

@pbrezina could you please rebase the patches and include links to the new page from the generic troubleshooting guide that is now merged?

Thank you for rebasing the patch, but why did you add a new troubleshooting section and not added the new document to the existing user section? I would be OK with a troubleshooting sub-section under 'users' but the main idea I had was to keep the pages most people need (and most people are users looking for troubleshooting) at the top in the users section so it's easily visible.

I would also propose to get bidirectional links between this page and the generic troubleshooting guide.

Did I rebase the patch? I don't remember :-) I can put it into users directory, I don't mind. Also Justin's idea are good, I will incorporate them into the document. Thank you.

rebased

6 years ago

I added Justin's comments and move it to users/troubleshooting.

@jstephen I assume that you are fine with current version or can you fine anything else after longer period :-) ?

Looks good to me, thanks!

Commit d534112 fixes this pull-request

Pull-Request has been merged by pbrezina@redhat.com

6 years ago