#835 intermittent startup errors
Closed: Invalid None Opened 13 years ago by jhrozek.

This problem is the real cause of https://bugzilla.redhat.com/show_bug.cgi?id=692472

On one of the test systems, the provider completely failed to read its configuration. This resulted in breakage such as passing NULL as the server name which lead to the crash mentioned above.

The logs on the affected machine showed:

(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [load_backend_module] (7): Loading backend [ldap] with path [/usr/lib64/sssd/libsss_ldap.so].
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_uri has value ldaps://
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_search_base has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_default_bind_dn has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_default_authtok_type has value password
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_default_authtok has no binary value.
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_search_timeout has value 6
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_network_timeout has value 6
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_opt_timeout has value 6
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_tls_reqcert has value hard
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_user_search_base has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_user_search_scope has value sub
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_user_search_filter has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_group_search_base has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_group_search_scope has value sub
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_group_search_filter has value (null)
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_schema has value rfc2307bis
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_offline_timeout has value 60
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_force_upper_case_realm is TRUE
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_enumeration_refresh_timeout has value 300
(Thu Mar 31 17:40:22 2011) [sssd[be[AD]]] [dp_get_options] (6): Option ldap_purge_cache_timeout has value 10800

Note the mix of NULL and default values.

This is not really reproducible in a good way.


Replying to [ticket:835 jhrozek]:

On one of the test systems, the provider completely failed to read its configuration. This resulted in breakage such as passing NULL as the server name which lead to the crash mentioned above.

Completely meaning the ini API failed or reading from the conf db failed?

Replying to [comment:1 dpal]:

Replying to [ticket:835 jhrozek]:

On one of the test systems, the provider completely failed to read its configuration. This resulted in breakage such as passing NULL as the server name which lead to the crash mentioned above.

Completely meaning the ini API failed or reading from the conf db failed?

I'm not entirely sure, but my guess is confdb. The reason being that the provider was able to read the "id_provider" value, because
it was loading the ldap backend correctly. The subsequent confdb calls did not find the desired keys, so default values were used.

It could be either one. Either the INI lookup didn't get all the values correctly to dump into the confdb, or the confdb searches were returning incorrectly.

Unfortunately, we don't have any way to reproduce the problem, and to the best of my knowledge it hasn't been seen again.

It's also possible that the machine in question had a bad stick of RAM or something.

Might be related to disk space or memory or SELinux.

milestone: NEEDS_TRIAGE => SSSD 1.6.0
priority: major => minor

This problem hasn't been seen lately. I'm closing this bug for now. It can be reopened if the bug reappears.

patch: => 0
resolution: => worksforme
status: new => closed

Metadata Update from @jhrozek:
- Issue set to the milestone: SSSD 1.6.0

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1877

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata