| |
@@ -0,0 +1,272 @@
|
| |
+ .. highlight:: none
|
| |
+
|
| |
+ Multiple server addresses or names in kdcinfo files
|
| |
+ ===================================================
|
| |
+
|
| |
+ Related ticket(s):
|
| |
+ ------------------
|
| |
+ * TBD
|
| |
+
|
| |
+
|
| |
+ Problem statement
|
| |
+ -----------------
|
| |
+ When a user authenticates using Kerberos, the KDCs that will actually be
|
| |
+ used are either discovered by libkrb5 with the help of DNS SRV records,
|
| |
+ or the KDCs are configured explicitly in ``/etc/krb5.conf.`` or provided
|
| |
+ by a special `locator plugin`.
|
| |
+
|
| |
+ Because the administrator expects that the servers they defined in
|
| |
+ ``sssd.conf`` would be used for both authentication through SSSD and by
|
| |
+ applications that use libkrb5, such the Kerberos command line tools like
|
| |
+ ``kinit``, SSSD provides a locator plugin for libkrb5 that allows SSSD to
|
| |
+ inform libkrb5 about the servers SSSD had configured.
|
| |
+
|
| |
+ However, SSSD, at least in the typical use case, only writes the information
|
| |
+ about the single server it connects to and changes the address only when
|
| |
+ the daemon reconnects to a different server. This creates a problem in case
|
| |
+ the server whose address is written in the kdcinfo file is unreachable
|
| |
+ but no action towards sssd that would provoke a fail over (such as a
|
| |
+ user login over PAM) is executed. In that case, the kdcinfo file contains
|
| |
+ stale entries and because from libkrb5 point of view, the kdcinfo files
|
| |
+ are authoritative and if the information present there is not useful,
|
| |
+ libkrb5 cannot reach any KDCs from that domain.
|
| |
+
|
| |
+ To improve the situation, this design page proposes adding a new sssd option
|
| |
+ that, if set, would enable sssd to write additional host names into the
|
| |
+ kdcinfo files which would then allow the plugin to iterate over these
|
| |
+ items and in turn allow libkrb5 to have sort of a failover for entries
|
| |
+ configured in sssd.conf or autodiscovered by SSSD.
|
| |
+
|
| |
+ Use cases
|
| |
+ ---------
|
| |
+ A typical sequence that triggers this problem is this:
|
| |
+ * log in with a PAM service to a machine. This causes a KDC address to
|
| |
+ be written to the kdcinfo file
|
| |
+ * disable the KDC server, e.g. by enabling a restrictive firewall rule
|
| |
+ * call kinit on the client where the kdcinfo file was written
|
| |
+
|
| |
+ Overview of the solution
|
| |
+ ------------------------
|
| |
+ The Kerberos locator plugin reads the address(es) from per-realm text files
|
| |
+ written by SSSD located in the ``/var/lib/sss/pubconf`` directory. At the
|
| |
+ moment, the plugin can already read multiple entries, but currently only
|
| |
+ numerical addresses are supported.
|
| |
+
|
| |
+ On a high level, implementing this RFE requires several changes:
|
| |
+ * change the Kerberos locator plugin so that it can also consume
|
| |
+ host names in addition to numerical addresses. These host names
|
| |
+ would be resolved in the plugin itself and passed to libkrb5 with
|
| |
+ the help of a callback function libkrb5 provides to the plugin
|
| |
+ * add a new SSSD option that would limit the number of entries that
|
| |
+ SSSD writes to the kdcinfo plugin. This is needed to avoid time
|
| |
+ outs in case the network was truly unreachable. The default value
|
| |
+ of the option could perhaps be different in master and sssd-1-16
|
| |
+ where master could default to writing multiple entries, but
|
| |
+ sssd-1-16 would default the option to 0 in order to not change
|
| |
+ behaviour of a stable branch.
|
| |
+ * extend the online callback which the SSSD fail over component uses
|
| |
+ to write the current server to the kdcinfo files to also write
|
| |
+ additional server host names in addition to the current server address
|
| |
+ * to enable writing multiple server addresses, the request to resolve
|
| |
+ a server for a service should be extended to resolve host names
|
| |
+ up to the specified limit
|
| |
+
|
| |
+ When it comes to resolving the servers, there are several scenarios to
|
| |
+ consider:
|
| |
+
|
| |
+ * The servers can be enumerated using an option. This includes
|
| |
+ ``krb5_server/krb5_backup_server`` for the krb5 provider and
|
| |
+ ``ipa_server/ipa_backup_server`` and ``ad_server/ad_backup_server``
|
| |
+ for the IPA and AD providers.
|
| |
+ * The servers can be completely autodiscovered. Typically this is
|
| |
+ done by either omitting the ``*_server`` options completely or
|
| |
+ using the ``_srv_`` identifier. As long as the list is omitted
|
| |
+ or the ``_srv_`` record is the first one in the list, any fail
|
| |
+ over service resolution would trigger the DNS SRV lookups and
|
| |
+ resolve the whole list. It is useful to note that the ``_srv_``
|
| |
+ identifier is not permitted in the backup server list explicitly,
|
| |
+ but the AD provider does resolve a SRV query into the backup
|
| |
+ server list. That is done in case an AD site is used, then the servers
|
| |
+ from the AD site are added as 'primary' and the global servers
|
| |
+ form the 'backup' list.
|
| |
+ * A mix of the above. The most complex case from the point of
|
| |
+ this RFE is a list that starts with a host name, but includes
|
| |
+ the ``_srv_`` identifier later on, e.g. ``krb5_server = kdc.example.com,
|
| |
+ _srv_``. In this case, currently calling the fail over resolution
|
| |
+ would only resolve the host name of ``kdc.example.com``, but not
|
| |
+ the SRV query, so unless the fail over code is extended, the
|
| |
+ host names originating from the SRV query would not be known
|
| |
+ after the service resolution finishes.
|
| |
+
|
| |
+ Implementation details
|
| |
+ ----------------------
|
| |
+ The interface the locator plugin uses to communicate with libkrb5 is a
|
| |
+ callback function provided by the caller (libkrb5), SSSD is supposed
|
| |
+ to pass a struct sockaddr to the caller. The Kerberos locator plugin
|
| |
+ is already capable of iterating over multiple addresses, but currently
|
| |
+ really only numerical addresses are supported and the plugin converts
|
| |
+ the string representation of the address into struct sockaddr by calling
|
| |
+ ``getaddrinfo(3)`` with the ``AI_NUMERICHOST`` parameter. We should extend
|
| |
+ the locator plugin code by calling getaddrinfo for entries that do not
|
| |
+ represent an address to resolve a host name and pass its address. This
|
| |
+ can be a first self-contained step in the implementation.
|
| |
+
|
| |
+ The kdcinfo files are written (using ``write_krb5info_file``) either
|
| |
+ during an online callback or in a special-case for IPA trust clients. The
|
| |
+ special case is already doing something similar to what this page
|
| |
+ is about by looking into a subsection representing a trusted domain
|
| |
+ (e.g. ``[domain/ipa.test/win.trust.test]``) and resolving all the servers
|
| |
+ in that list either by name or based on a site selection. However, this
|
| |
+ is done during the subdomain provider operation, not during a resolver
|
| |
+ callback and all the addresses configured in the ``sssd.conf`` file are
|
| |
+ always resolved and written to the config file.
|
| |
+
|
| |
+ The ``write_krb5info_file`` receives a linked list of ``struct fo_server``
|
| |
+ structures which contains the address, if already resolved, or at least
|
| |
+ a host name in the ``struct server_common`` member structure. Since the
|
| |
+ callback should already be synchronous and not do much work on its own, it
|
| |
+ would be best if the callback was already invoked with the data provided,
|
| |
+
|
| |
+ There are two kinds of servers in the fail over module - primary and
|
| |
+ backup. The backup servers are supposed to only be used temporarily
|
| |
+ and sssd periodically tries to connect to one of the primary servers.
|
| |
+ However, from the fail over code point of view, even adding a "backup"
|
| |
+ server still means the server is added to the same linked list, just with
|
| |
+ a flag denoting that the server is not primary, therfore iterating over
|
| |
+ a single list would iterate over both the primary and backup servers.
|
| |
+
|
| |
+ Before changing the online callbacks, it would be useful to implement and
|
| |
+ read the ``krb5_kdcinfo_lookahead`` option so that there is already an
|
| |
+ upper limit when the callbacks write the extra host names.
|
| |
+
|
| |
+ The next step of implementation could be extending the online
|
| |
+ callbacks that call the ``write_krb5info_file`` functions. There are
|
| |
+ several of them, ``ad_resolve_callback``, ``ipa_resolve_callback``
|
| |
+ and ``krb5_resolve_callback``. The callbacks receive the current
|
| |
+ ``struct fo_server`` instance. The callbacks would then keep iterating
|
| |
+ over the linked list until either the list is exhausted or as many as
|
| |
+ ``krb5_kdcinfo_lookahead`` items are processed. The host name from the
|
| |
+ ``struct server_common`` structure would be read using ``fo_get_server_name``
|
| |
+ and written to the array passed to ``write_krb5info_file``.
|
| |
+
|
| |
+ One question to consider is whether to use the ``fo_server`` instances before
|
| |
+ the current one, i.e. those that SSSD tried before and couldn't connect to.
|
| |
+ I think it would make sense to add them to the end of the list, at least
|
| |
+ for the primary servers not from a SRV query, because sssd never reconnects
|
| |
+ to a server earlier in the list as long as later server works. The SRV queries
|
| |
+ are different in this respect in the sense that they time out and force
|
| |
+ SSSD to resolve the whole list once a server is requested again (typically
|
| |
+ either during authentication or once the LDAP connection expires).
|
| |
+
|
| |
+ Finally, the case where the fail over code needs to do additional lookups
|
| |
+ in order to resolve at least the amount of host names requested by the
|
| |
+ ``krb5_kdcinfo_lookahead`` should be addressed. The caller that initializes
|
| |
+ the fail over service (maybe with ``be_fo_add_service``) should provide
|
| |
+ a hint with the value of the lookahead option. Then, if a request for
|
| |
+ server resolution is triggered, the fail over code would resolve a server
|
| |
+ and afterwards check if enough ``fo_server`` entries with a valid hostname
|
| |
+ in the ``struct server_common`` structure. If not, the request would
|
| |
+ check if any of the ``fo_server`` structures represents a SRV query and
|
| |
+ try to resolve the query to receive more host names.
|
| |
+
|
| |
+ Configuration changes
|
| |
+ ---------------------
|
| |
+ A new configuration option called ``krb5_kdcinfo_lookahead`` would be added.
|
| |
+ This option would default to a sensible non-zero value in the master
|
| |
+ branch, perhaps 3 so that attempting to resolve the extra host names does
|
| |
+ not cause the libkrb5 operation to time out. If the patches are backported
|
| |
+ to any stable branch, the option must default to 0 (disabled).
|
| |
+
|
| |
+ In the first iteration, we might want to just read a single number, but
|
| |
+ in the future, the option should be extended to accept two numbers in the
|
| |
+ ``total:backup`` notation. This would mean write up to ``total`` servers,
|
| |
+ but include up to ``backup`` servers from the backup list. This would be
|
| |
+ useful in case none of the servers from the primary list are reachable,
|
| |
+ because e.g. they all come from the same AD site, but servers outside the
|
| |
+ site are reachable. This extension would only make sense if SSSD does not
|
| |
+ resolve the host names on its own, which might be another future extension.
|
| |
+
|
| |
+ It might be a good idea to add a note to the ``sssd-ad`` and ``sssd-ipa``
|
| |
+ man pages or even the shared fail over man page include file with a pointer
|
| |
+ to how the kdcinfo files work so that the information is easy to discover
|
| |
+ for administrators.
|
| |
+
|
| |
+ How To Test
|
| |
+ -----------
|
| |
+ Plugin test
|
| |
+ With any of the below tests or even after writing the host names to
|
| |
+ the kdcinfo files directly, make sure the first entry in the list is
|
| |
+ unreachable. Then call e.g. `kinit` and check that the operation succeeds.
|
| |
+
|
| |
+ Backwards compatibility test
|
| |
+ Set the ``krb5_kdcinfo_lookahead`` option to 0. Define multiple servers
|
| |
+ and perform Kerberos authentication. Make sure that only the current server
|
| |
+ is written to the kdcinfo files.
|
| |
+
|
| |
+ Write a list of servers
|
| |
+ Set the ``krb5_resolve_callback`` to a positive value. Make sure that the
|
| |
+ first entry in the kdcinfo files is an address and the other entries are
|
| |
+ host names from the configuration. This test case should be extended to
|
| |
+ make sure only so many entries as the value of the option are written,
|
| |
+ or if there are fewer entries in the config file, all are writen.
|
| |
+
|
| |
+ Fail over test
|
| |
+ Similar to the above, except make sure the first entry in the list cannot
|
| |
+ be contacted. Then, SSSD should resolve the next entry to the address
|
| |
+ and if applicable write the rest of the list.
|
| |
+
|
| |
+ Backup server test
|
| |
+ At the minimum, we should make sure that servers from the backup list
|
| |
+ are written to the kdcinfo files. If the option would implement the split
|
| |
+ ``total:backup`` value, then those should be tested as well.
|
| |
+
|
| |
+ (Optional) writing a previously tried, not working server
|
| |
+ If it is agreed during design review that also not working servers are to
|
| |
+ be written to the kdcinfo files (see the section about not working
|
| |
+ servers), then a test case should make sure those
|
| |
+ are written to the end of the list.
|
| |
+
|
| |
+ SRV resolution test
|
| |
+ Leave the server list (e.g. ``krb5_server``) option empty. Make sure
|
| |
+ a DNS SRV query for the configured realm returns valid servers and
|
| |
+ they are written to the config file.
|
| |
+
|
| |
+ Combined SRV and server list
|
| |
+ Set the ``krb5_server`` option to ``hostname, _srv_``. Set the
|
| |
+ ``krb5_kdcinfo_lookahead`` option to a value greater than 1. Make
|
| |
+ sure that the host names from the DNS SRV query are also present
|
| |
+ in the kdcinfo files.
|
| |
+
|
| |
+ IPA client test
|
| |
+ The test cases above should be repeated for an IPA client as well in
|
| |
+ case the IPA online callbacks are modified.
|
| |
+
|
| |
+ AD site test
|
| |
+ Add an AD client to a site or set the site in the config file. Make
|
| |
+ sure that the servers from the site are written first, followed
|
| |
+ by the global servers up to the ``krb5_kdcinfo_lookahead`` value.
|
| |
+
|
| |
+ How To Debug
|
| |
+ ------------
|
| |
+ Any new code must be decorated with DEBUG messages. To debug the locator
|
| |
+ plugin changes, using ``KRB5_TRACE`` or even calling ``strace`` might be
|
| |
+ useful.
|
| |
+
|
| |
+ Future development
|
| |
+ ------------------
|
| |
+ First, it might be useful to extend the resolver or fail over code to resolve
|
| |
+ the names on its own to save some potentially blocking calls in the plugin.
|
| |
+ There is already an example of ``resolv_hostport_list_send`` that can perhaps
|
| |
+ be reused.
|
| |
+
|
| |
+ Additionally, we already plan for some time to include connectivity checks
|
| |
+ with cLDAP ping or just plain ``connect()`` to make sure that servers that
|
| |
+ cannot be contacted at all are not tried. This is of course outside of the
|
| |
+ scope of this work, but should be kept in mind to not implement something
|
| |
+ incompatible.
|
| |
+
|
| |
+ Authors
|
| |
+ -------
|
| |
+ * Sumit Bose <sbose@redhat.com>
|
| |
+ * Tomas Halman <thalman@redhat.com>
|
| |
+ * Jakub Hrozek <jhrozek@redhat.com>
|
| |
Design document about multiple server names or addresses in the kdcinfo files and enabling locator plugin fail over