#552 failover code does not take multiple A records into account
Closed: Fixed None Opened 12 years ago by jhrozek.

If a host name resolves into multiple IP addresses, only the first one (specifically in the case of the Kerberos provider) is tried which may result in going offline even though others are up.

Stephen proposed that we have two different timeouts - one for the individual servers and one for all of them.


Fields changed

component: SSSD => Failover

Jakub and I discussed this a bit yesterday. I'll summarize our discussion here for posterity.

The request from the user was this, first: If a hostname resolves into multiple IP addresses, each of these addresses should be considered a potential failover address.

There are pros and cons to this approach:

Pros

- Failover can be controlled entirely by DNS without SRV records.
- It becomes easier to add and remove failover servers without updating sssd.conf on every system

Cons

- This is different from how DNS is treated for nearly all other applications. For example, a web browser will use only the first IP received from DNS, and if it is unreachable will just return failure.
- This reduces the effectiveness of DNS as a load-balancer.
- It would require new timeout processing. For example, we would need to specify a maximum timeout for each server being checked, and we would need either a maximum timeout to check all servers, or we would need to specify how many servers on the list we would attempt to reach.

Workaround

- Failover currently works perfectly fine with comma-separated entries in the sssd.conf. It would be a fairly simple matter to split a theoretical {{{kerberos.example.com}}} entry into two or more {{{kerberos1.example.com}}} ... {{{kerberosN.example.com}}} entries. These entries would then be load-balanced by DNS.

Fields changed

component: Failover => Documentation
doc: 0 => 1
milestone: NEEDS_TRIAGE => SSSD 1.2.2
owner: somebody => sbose

I think the original problem came from an Active Directory setup. With AD it is possible with an A record request for the domain name to get all the Domain Controllers of the domain. E.g. try

host domain.name nameserver.domain.name

in an AD environment. As e.g. mentioned in http://technet.microsoft.com/en-us/library/cc759550%28WS.10%29.aspx this was made possible to enable "a non-SRV-aware client to locate any domain controller in the domain by looking up an A record." Now that sssd is SRV-aware we should add a paragraph to the man page explaining that server names in sssd can either be server names which resolves to a single IP address or SRV resource records like _ldap._tcp.DnsDomainName or _kerberos._udp.DnsDomainName which may return more then one server.

Fields changed

owner: sbose => davido

ffe0d31..5a24378 master -> master

Updated section on how failover works.

doc: 1 => 0
docupdated: 0 => 1
resolution: => fixed
status: new => closed

Fields changed

fixedin: => Doc

Fields changed

rhbz: => 0

Metadata Update from @jhrozek:
- Issue assigned to davido
- Issue set to the milestone: SSSD 1.2.2

5 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1594

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata