#3217 Conflicting default timeout values
Closed: Fixed 4 years ago by jhrozek. Opened 7 years ago by gagrio.

dns_resolver_timeout should not have the same default value with ldap_opt_timeout.

The reason being when we fail to resolve a server to allow trying to resolve another one before the ldap bind fails.

In cases of direct AD integration where DNS=KDC=DC failing to resolve the server causes SASL bind to timeout since it does not have enough time to try to bind on the next server:

---
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'server.xxxx.com' in DNS
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [request_watch_destructor] (0x0400): Deleting request watch
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_resolv_done] (0x0400): Connecting to ldap://server.xxxx.com:389
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://server.xxxx.com:389/??base] with fd [21].
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_done] (0x0400): Successful connection to ldap://server.xxxx.com:389
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(DnsDomain=xxxx.com)(NtVer=\14\00\00\00))][].
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [netlogon]
(Tue Oct  4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_process_result] (0x0040): ldap_result error: [Can't contact LDAP server]
---

Also the "successful connection" message looks kind of misleading.


Hi Pavel, could you please take a look at this ticket? It seems to me that the failover timeout (that's why I'm asking you, you're quite familiar with the failover code) fires before we can cycle through all resolver servers..

So I think we should test if cycling through all resolvers works correctly..

owner: somebody => pbrezina

Fields changed

description: dns_resolver_timeout should not have the same default value with ldap_opt_timeout.

The reason being when we fail to resolve a server to allow trying to resolve another one before the ldap bind fails.

In cases of direct AD integration where DNS=KDC=DC failing to resolve the server causes SASL bind to timeout since it does not have enough time to try to bind on the next server:


(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'server.xxxx.com' in DNS
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [request_watch_destructor] (0x0400): Deleting request watch
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_resolv_done] (0x0400): Connecting to ldap://server.xxxx.com:389
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://server.xxxx.com:389/??base] with fd [21].
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_done] (0x0400): Successful connection to ldap://server.xxxx.com:389
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(DnsDomain=xxxx.com)(NtVer=\14\00\00\00))][].
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [netlogon]
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_process_result] (0x0040): ldap_result error: [Can't contact LDAP server]


Also the "successful connection" message looks kind of misleading. => dns_resolver_timeout should not have the same default value with ldap_opt_timeout.

The reason being when we fail to resolve a server to allow trying to resolve another one before the ldap bind fails.

In cases of direct AD integration where DNS=KDC=DC failing to resolve the server causes SASL bind to timeout since it does not have enough time to try to bind on the next server:

{{{

(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'server.xxxx.com' in DNS
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [request_watch_destructor] (0x0400): Deleting request watch
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_resolv_done] (0x0400): Connecting to ldap://server.xxxx.com:389
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://server.xxxx.com:389/??base] with fd [21].
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_connect_host_done] (0x0400): Successful connection to ldap://server.xxxx.com:389
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(DnsDomain=xxxx.com)(NtVer=\14\00\00\00))][].
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [netlogon]
(Tue Oct 4 13:10:38 2016) [sssd[be[xxxx.com]]] [sdap_process_result] (0x0040): ldap_result error: [Can't contact LDAP server]


}}}
Also the "successful connection" message looks kind of misleading.

Both linked bugzillas have the same id, is this a duplicate entry or a typo?

We fail to cycle if we timeout on ldap level since the timeouts are tha same, as mentioned in the ticket description. This is something we should change.

Can you tried setting ldap_tls_reqcert = never in your domain and check if you can connect? The message "Connection successful", but "Can't contact LDAP server" may indicate that we successfully connected to the server but TLS certificate was refused.

Please put these comments in the bugzilla, it has a much better chance of being read there.

Also, could you please check https://bugzilla.redhat.com/show_bug.cgi?id=1390210 ? I thought I linked it with this ticket, but apparently not.

Thanks for the tip Pavel.
We don't have an active case/issue anymore but please keep the priority on this bug as it can potentially affect a lot of users.

Yes, those to BZ seems to be the same issue with timeout default values.

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.15 Beta

Fields changed

milestone: SSSD 1.15.3 => SSSD 1.15.2

There is actually one more undocumented timeout: dns_resolver_op_timeout, which is a timeout for single dns resolution. To summarize it, we have:

  1. dns_resolver_op_timeout -- timeout for single dns query
  2. dns_resolver_timeout -- timeout for service resolution (it may include multiple dns queries)
  3. ldap_opt_timeout -- timeout for an LDAP query which may trigger dns resolution depending on current failover status

Thus the correct order should be:

dns_resolver_op_timeout < dns_resolver_timeout < ldap_opt_timeout

Metadata Update from @gagrio:
- Issue assigned to pbrezina
- Issue set to the milestone: SSSD 1.15.2

7 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset
- Custom field mark reset
- Custom field patch reset
- Custom field review reset
- Custom field sensitive reset
- Custom field testsupdated reset
- Issue close_status updated to: None
- Issue set to the milestone: SSSD 1.15.3 (was: SSSD 1.15.2)

7 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset
- Custom field mark reset
- Custom field patch reset
- Custom field review reset
- Custom field sensitive reset
- Custom field testsupdated reset
- Issue set to the milestone: SSSD 1.15.4 (was: SSSD 1.15.3)

7 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue tagged with: cleanup-two-point-oh

6 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue untagged with: cleanup-two-point-oh
- Issue set to the milestone: SSSD 2.0 (was: SSSD 1.15.4)

6 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue priority set to: critical (was: major)
- Issue tagged with: breaks compatibility

6 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue set to the milestone: SSSD 2.1 (was: SSSD 2.0)

5 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue set to the milestone: SSSD 2.2 (was: SSSD 2.1)

5 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue set to the milestone: SSSD 2.3 (was: SSSD 2.2)

4 years ago

Metadata Update from @jhrozek:
- Custom field design_review reset (from false)
- Custom field mark reset (from false)
- Custom field patch reset (from false)
- Custom field review reset (from false)
- Custom field sensitive reset (from false)
- Custom field testsupdated reset (from false)
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/4250

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata