Learn more about these different git repos.
Other Git URLs
Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 6): Bug 966757
Created attachment 752397 Traces from client and DC of failure and success Description of problem: When using SRV record failover for integration of SSSD with Active Directory, everything works fine if the first listed DNS server specified in resolv.conf is alive and well. However, if the first listed DNS server in resolv.conf is down, SSSD black holes the request. In a packet trace, we can see the behavior as a RST after the DNS server that is down is queried. This can be problematic in Active Directory, as often, the DNS servers are the DCs, which in turn are also the LDAP servers. The point of failover is to survive an outage of an LDAP server, but if that LDAP server is also the first listed DNS server, SSSD just breaks. The behavior is similar to that of attempting to use RR DNS for failover, which is not supported as per the documentation. SSSD should not fail if there are other viable DNS servers in resolv.conf. Instead, it should move on to the next DNS server and retry the request. Version-Release number of selected component (if applicable): # rpm -qa sssd sssd-1.9.2-82.7.el6_4.x86_64 How reproducible: Consistently reproducible Steps to Reproduce: 1. Power down the first DNS server listed in resolv.conf 2. stop SSSD, remove the cache and start SSSD 3. attempt getent or id to the LDAP server Actual results: getent/id fails to return valid info kerberos ticket is issued properly, SASL bind works, but LDAP connection gets reset Expected results: SSSD should pick up the next DNS server and re-try the request. Additional info: Contents of /etc/resolv.conf file: # cat /etc/resolv.conf search domain.com nameserver 10.61.179.155 nameserver 10.61.179.152 nameserver 10.61.179.174 The first server in the list is unreachable: # ping 10.61.179.155 PING 10.61.179.155 (10.61.179.155) 56(84) bytes of data. From 10.61.179.150 icmp_seq=2 Destination Host Unreachable From 10.61.179.150 icmp_seq=3 Destination Host Unreachable From 10.61.179.150 icmp_seq=4 Destination Host Unreachable ^C --- 10.61.179.155 ping statistics --- 5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4206ms pipe 3 Dig still works fine, meaning the other DNS servers are working properly: # dig SRV _ldap._tcp.domain.com ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6 <<>> SRV _ldap._tcp.domain..com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25072 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3 ;; QUESTION SECTION: ;_ldap._tcp.domain.com. IN SRV ;; ANSWER SECTION: _ldap._tcp.domain.com. 600 IN SRV 0 100 389 2k8-dc-3.domain.com. _ldap._tcp.domain.com. 600 IN SRV 0 100 389 2k8-dc-1.domain.com. _ldap._tcp.domain.com. 600 IN SRV 0 100 389 2k8-dc-2.domain.com. ;; ADDITIONAL SECTION: 2k8-dc-3.domain.com. 3600 IN A 10.61.179.174 2k8-dc-1.domain.com. 3600 IN A 10.61.179.152 2k8-dc-2.domain.com. 3600 IN A 10.61.179.155 ;; Query time: 3 msec ;; SERVER: 10.61.179.152#53(10.61.179.152) ;; WHEN: Thu May 23 17:14:22 2013 ;; MSG SIZE rcvd: 260 The sssd.conf file is set up to leverage SRV records (by way of omitting the ldap_uri, krb5_kpasswd and krb5_server values: # cat /etc/sssd/sssd.conf [domain/default] cache_credentials = True case_sensitive = False [sssd] config_file_version = 2 services = nss, pam domains = DOMAIN debug_level = 0x4000 [nss] filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd filter_groups = root [pam] [domain/DOMAIN] id_provider = ldap auth_provider = krb5 case_sensitive = False chpass_provider = krb5 ldap_search_base = dc=domain,dc=com ldap_schema = rfc2307 ldap_sasl_mech = GSSAPI ldap_user_object_class = user ldap_group_object_class = group ldap_user_home_directory = unixHomeDirectory ldap_user_principal = userPrincipalName ldap_group_member = memberUid ldap_group_name = cn ldap_account_expire_policy = ad ldap_force_upper_case_realm = true ldap_group_search_base = cn=Users,dc=domain,dc=com ldap_sasl_authid = root/centos64.domain.com@DOMAIN.COM entry_cache_timeout = 120 krb5_realm = DOMAIN.COM cache_credentials = false krb5_canonicalize = false When the first server is down, SSSD lookups fail: # service sssd stop Stopping sssd: [ OK ] # rm -f /var/lib/sss/db/* # service sssd start Starting sssd: [ OK ] # getent passwd ldapuser # When a working DNS server is moved to first in the list, SSSD lookups succeed without even needing to restart SSSD: # cat /etc/resolv.conf search domain.com nameserver 10.61.179.152 nameserver 10.61.179.155 nameserver 10.61.179.174 # ping 10.61.179.152 PING 10.61.179.152 (10.61.179.152) 56(84) bytes of data. 64 bytes from 10.61.179.152: icmp_seq=1 ttl=128 time=0.830 ms 64 bytes from 10.61.179.152: icmp_seq=2 ttl=128 time=0.266 ms ^C --- 10.61.179.152 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1040ms rtt min/avg/max/mdev = 0.266/0.548/0.830/0.282 ms # getent passwd ldapuser ldapuser:*:1603:513:ldapuser:/home/ldapuser:/bin/sh Packet traces of the failed attempt and the successful attempt are attached. In the trace: - SSSD is started after clearing the cache - getent passwd ldapuser is executed Traces filtered for IPs on domain controllers and client. Actual domain names are in the traces. Domain names above just used as placeholders. IPs: client - 10.61.179.150 DC1/DNS1 - 10.61.179.152 DC2/DNS2 - 10.61.179.155 DC3/DNS3 - 10.61.179.174
Fields changed
blockedby: => blocking: => changelog: => coverity: => design: => design_review: => 0 feature_milestone: => fedora_test_page: => milestone: NEEDS_TRIAGE => SSSD 1.10.0 review: True => 0 selected: => testsupdated: => 0
milestone: SSSD 1.10.0 => SSSD 1.10.1
owner: somebody => mzidek
Michal, please test if this issue still persists with Pavel's patches that are currently on review in the thread called "[PATCHES] fix SRV expansion".
Moving tickets that didn't make 1.10.1 to the 1.10.2 bucket.
Moving tickets that didn't make 1.10.1 to 1.10.2
milestone: SSSD 1.10.1 => SSSD 1.10.2
patch: 0 => 1
resolution: => fixed status: new => closed
changelog: => N/A, just a bugfix
Metadata Update from @jhrozek: - Issue assigned to mzidek - Issue set to the milestone: SSSD 1.10.2
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/3008
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Log in to comment on this ticket.