#2439 Return a different errno from client when sssd is not running.
Closed: Fixed None Opened 5 years ago by jhrozek.

We need to change the errno we are returning from the client library in case SSSD is not running. See a very detailed analysis by Carlos O'Donnel below:

In all honesty, SSSD is behaving correctly. It returns: status==NSS_STATUS_UNAVAIL, and errno==ENOENT, to indicate "A necessary input file cannot be found."[1] I can only assume this is because it is true, the daemon is not running and the return status as indicated implies that. Such an error will be propagated up to the client, getgrent in this case, as a NULL return with errno set to ENOENT.

However, returning such a status and errno defeats the purpose of having sss in the nsswitch.conf. The poing of having sss in the list is such that SSSD can be transparently enabled without restarting any applications. This is partly because nsswitch.conf is read only once at startup/first use and never again. Thus the sss nss plugin must always behave as if it were working correctly and return that it found nothing.

Therefore the sss nss plugin must always return status==NSS_STATUS_NOTFOUND and errno==0 if the backend is not available. This indicates to glibc that sss is functioning but has no data. The only down side to this is that nscd, or any caching layer may at this point assume sss has no data, cache this result, and it will remain cached until such caches time out and try again.

The sss nss plugin should not return status=NSS_STATUS_TRYAGAIN errno=EAGAIN since that will still result in a userspace error.

With SSSD using status==NSS_STATUS_TRYAGAIN errno==EAGAIN:

getgrent() OK group(0) = root 
getgrent() OK group(1) = bin 
getgrent() OK group(2) = daemon 
...
getgrent: Resource temporarily unavailable
getgrent error 11

With SSSD using status==NSS_STATUS_UNAVAIL errno==ENOENT:

getgrent() OK group(0) = root 
getgrent() OK group(1) = bin 
getgrent() OK group(2) = daemon 
...
getgrent: No such file or directory
getgrent error 2

With SSSD using status==NSS_STATUS_NOTFOUND errno==0 (as suggested):

getgrent() OK group(0) = root 
getgrent() OK group(1) = bin 
getgrent() OK group(2) = daemon 
getgrent() OK group(3) = sys 
...
getgrent() OK group(185) = wildfly

Which completes successfully and is the only way it should
work for an installed SSSD nss module.

e.g.

diff -urN sssd-1.11.6/src/sss_client/nss_group.c sssd-1.11.6.mod/src/sss_client/nss_group.c
--- sssd-1.11.6/src/sss_client/nss_group.c  2014-06-03 10:31:33.000000000 -0400
+++ sssd-1.11.6.mod/src/sss_client/nss_group.c  2014-09-10 12:21:52.330685026 -0400
@@ -539,6 +539,11 @@
     if (nret != NSS_STATUS_SUCCESS) {
         errno = errnop;
     }
+    /* Always pretend we have no data.  */
+    if (nret == NSS_STATUS_UNAVAIL) {
+   nret = NSS_STATUS_NOTFOUND;
+   errno = 0;
+    }

     sss_nss_unlock();
     return nret;
@@ -639,6 +644,11 @@
     if (nret != NSS_STATUS_SUCCESS) {
         errno = errnop;
     }
+    /* Always pretend we have no data.  */
+    if (nret == NSS_STATUS_UNAVAIL) {
+   nret = NSS_STATUS_NOTFOUND;
+   errno = 0;
+    }

     sss_nss_unlock();
     return nret;
---

This patch might be incomplete, you should review all the places you might interface with glibc's NSS plugin mechanism and make sure you don't return an error unless it's really an error instead of an inactive or not-installed daemon.

My actions:

  • Sent patch to linux kernel man pages to expand the definitions:

http://marc.info/?l=linux-man&m=141036753210390&w=2

  • Will update nss.texi in glibc manual to match.

Next steps for you:

  • Fix the sss nss plugin to return status==NSS_STATUS_NOTFOUND errno==0 when the daemon is down, indicating that things are running correctly but no data is available so no data is provided.
  • Test to makae sure that the negative cache hit in nscd for this is not too disruptive when sssd eventually comes online. It may be that you need to invalidate nscd's cache just after sssd is started up for the user to immediately get access to sssd data.

[1] http://www.gnu.org/software/libc/manual/html_node/NSS-Modules-Interface.html#NSS-Modules-Interface


Fields changed

description: In all honesty, SSSD is behaving correctly. It returns: status==NSS_STATUS_UNAVAIL, and errno==ENOENT, to indicate "A necessary input file cannot be found."[1] I can only assume this is because it is true, the daemon is not running and the return status as indicated implies that. Such an error will be propagated up to the client, getgrent in this case, as a NULL return with errno set to ENOENT.

However, returning such a status and errno defeats the purpose of having sss in the nsswitch.conf. The poing of having sss in the list is such that SSSD can be transparently enabled without restarting any applications. This is partly because nsswitch.conf is read only once at startup/first use and never again. Thus the sss nss plugin must always behave as if it were working correctly and return that it found nothing.

Therefore the sss nss plugin must always return status==NSS_STATUS_NOTFOUND and errno==0 if the backend is not available. This indicates to glibc that sss is functioning but has no data. The only down side to this is that nscd, or any caching layer may at this point assume sss has no data, cache this result, and it will remain cached until such caches time out and try again.

The sss nss plugin should not return status=NSS_STATUS_TRYAGAIN errno=EAGAIN since that will still result in a userspace error.

With SSSD using status==NSS_STATUS_TRYAGAIN errno==EAGAIN:

getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
...
getgrent: Resource temporarily unavailable
getgrent error 11

With SSSD using status==NSS_STATUS_UNAVAIL errno==ENOENT:

getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
...
getgrent: No such file or directory
getgrent error 2

With SSSD using status==NSS_STATUS_NOTFOUND errno==0 (as suggested):
getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
getgrent() OK group(3) = sys
...
getgrent() OK group(185) = wildfly

Which completes successfully and is the only way it should
work for an installed SSSD nss module.

e.g.
diff -urN sssd-1.11.6/src/sss_client/nss_group.c sssd-1.11.6.mod/src/sss_client/nss_group.c
--- sssd-1.11.6/src/sss_client/nss_group.c 2014-06-03 10:31:33.000000000 -0400
+++ sssd-1.11.6.mod/src/sss_client/nss_group.c 2014-09-10 12:21:52.330685026 -0400
@@ -539,6 +539,11 @@
if (nret != NSS_STATUS_SUCCESS) {
errno = errnop;
}
+ / Always pretend we have no data. /
+ if (nret == NSS_STATUS_UNAVAIL) {
+ nret = NSS_STATUS_NOTFOUND;
+ errno = 0;
+ }

 sss_nss_unlock();
 return nret;

@@ -639,6 +644,11 @@
if (nret != NSS_STATUS_SUCCESS) {
errno = errnop;
}
+ / Always pretend we have no data. /
+ if (nret == NSS_STATUS_UNAVAIL) {
+ nret = NSS_STATUS_NOTFOUND;
+ errno = 0;
+ }

 sss_nss_unlock();
 return nret;

This patch might be incomplete, you should review all the places you might interface with glibc's NSS plugin mechanism and make sure you don't return an error unless it's really an error instead of an inactive or not-installed daemon.

My actions:
- Sent patch to linux kernel man pages to expand the definitions:
http://marc.info/?l=linux-man&m=141036753210390&w=2
- Will update nss.texi in glibc manual to match.

Next steps for you:
- Fix the sss nss plugin to return status==NSS_STATUS_NOTFOUND errno==0 when the daemon is down, indicating that things are running correctly but no data is available so no data is provided.
- Test to makae sure that the negative cache hit in nscd for this is not too disruptive when sssd eventually comes online. It may be that you need to invalidate nscd's cache just after sssd is started up for the user to immediately get access to sssd data.

[1] http://www.gnu.org/software/libc/manual/html_node/NSS-Modules-Interface.html#NSS-Modules-Interface => We need to change the errno we are returning from the client library in case SSSD is not running. See a very detailed analysis by Carlos O'Donnel below:

In all honesty, SSSD is behaving correctly. It returns: status==NSS_STATUS_UNAVAIL, and errno==ENOENT, to indicate "A necessary input file cannot be found."[1] I can only assume this is because it is true, the daemon is not running and the return status as indicated implies that. Such an error will be propagated up to the client, getgrent in this case, as a NULL return with errno set to ENOENT.

However, returning such a status and errno defeats the purpose of having sss in the nsswitch.conf. The poing of having sss in the list is such that SSSD can be transparently enabled without restarting any applications. This is partly because nsswitch.conf is read only once at startup/first use and never again. Thus the sss nss plugin must always behave as if it were working correctly and return that it found nothing.

Therefore the sss nss plugin must always return status==NSS_STATUS_NOTFOUND and errno==0 if the backend is not available. This indicates to glibc that sss is functioning but has no data. The only down side to this is that nscd, or any caching layer may at this point assume sss has no data, cache this result, and it will remain cached until such caches time out and try again.

The sss nss plugin should not return status=NSS_STATUS_TRYAGAIN errno=EAGAIN since that will still result in a userspace error.

With SSSD using status==NSS_STATUS_TRYAGAIN errno==EAGAIN:
{{{
getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
...
getgrent: Resource temporarily unavailable
getgrent error 11
}}}

With SSSD using status==NSS_STATUS_UNAVAIL errno==ENOENT:
{{{
getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
...
getgrent: No such file or directory
getgrent error 2
}}}

With SSSD using status==NSS_STATUS_NOTFOUND errno==0 (as suggested):
{{{
getgrent() OK group(0) = root
getgrent() OK group(1) = bin
getgrent() OK group(2) = daemon
getgrent() OK group(3) = sys
...
getgrent() OK group(185) = wildfly
}}}

Which completes successfully and is the only way it should
work for an installed SSSD nss module.

e.g.
{{{
diff -urN sssd-1.11.6/src/sss_client/nss_group.c sssd-1.11.6.mod/src/sss_client/nss_group.c
--- sssd-1.11.6/src/sss_client/nss_group.c 2014-06-03 10:31:33.000000000 -0400
+++ sssd-1.11.6.mod/src/sss_client/nss_group.c 2014-09-10 12:21:52.330685026 -0400
@@ -539,6 +539,11 @@
if (nret != NSS_STATUS_SUCCESS) {
errno = errnop;
}
+ / Always pretend we have no data. /
+ if (nret == NSS_STATUS_UNAVAIL) {
+ nret = NSS_STATUS_NOTFOUND;
+ errno = 0;
+ }

 sss_nss_unlock();
 return nret;

@@ -639,6 +644,11 @@
if (nret != NSS_STATUS_SUCCESS) {
errno = errnop;
}
+ / Always pretend we have no data. /
+ if (nret == NSS_STATUS_UNAVAIL) {
+ nret = NSS_STATUS_NOTFOUND;
+ errno = 0;
+ }

 sss_nss_unlock();
 return nret;

}}}
This patch might be incomplete, you should review all the places you might interface with glibc's NSS plugin mechanism and make sure you don't return an error unless it's really an error instead of an inactive or not-installed daemon.

My actions:
- Sent patch to linux kernel man pages to expand the definitions:
http://marc.info/?l=linux-man&m=141036753210390&w=2
- Will update nss.texi in glibc manual to match.

Next steps for you:
- Fix the sss nss plugin to return status==NSS_STATUS_NOTFOUND errno==0 when the daemon is down, indicating that things are running correctly but no data is available so no data is provided.
- Test to makae sure that the negative cache hit in nscd for this is not too disruptive when sssd eventually comes online. It may be that you need to invalidate nscd's cache just after sssd is started up for the user to immediately get access to sssd data.

[1] http://www.gnu.org/software/libc/manual/html_node/NSS-Modules-Interface.html#NSS-Modules-Interface

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.12.2

Fields changed

owner: somebody => mzidek

We need to do a release as requested by downstream. Moving tickets that are not fixed already or very close to acking to 1.12.3

milestone: SSSD 1.12.2 => SSSD 1.12.3

Fields changed

mark: => 0
owner: mzidek => lslebodn

This upstream ticket was requesting by a downstream. Bumping the priority to make sure the ticket is closed as soon as possible.

priority: major => critical

Fields changed

patch: 0 => 1
status: new => assigned

resolution: => fixed
status: assigned => closed

Metadata Update from @jhrozek:
- Issue assigned to lslebodn
- Issue set to the milestone: SSSD 1.12.3

2 years ago

Login to comment on this ticket.

Metadata