#3959 sss_cache: Do nothing if SYSTEMD_OFFLINE=1
Closed 5 years ago by jhrozek. Opened 5 years ago by walters.
SSSD/ walters/sssd systemd-offline  into  master

file modified
+15
@@ -147,6 +147,21 @@ 

      bool skipped = true;

      struct sss_domain_info *dinfo;

  

+     /* In offline mode, there's not going to be a sssd instance

+      * running.  This occurs for both e.g. yum --installroot

+      * as well as rpm-ostree offline updates.

But the issue is not with not-running sssd but with read-only filesystem for /var/lib/sss/db

+      *

+      * So let's just quickly do nothing.  (Though note today

+      * yum --installroot doesn't set this variable, rpm-ostree

+      * does)

The documentation for systemd does not say anything about read-only /var
https://systemd.io/ENVIRONMENT

$SYSTEMD_OFFLINE=[0|1] — if set to 1, then systemctl will refrain from talking to PID 1; this has the same effect as the historical detection of chroot(). Setting this variable to 0 instead has a similar effect as SYSTEMD_IGNORE_CHROOT=1; i.e. tools will try to communicate with PID 1 even if a chroot() environment is detected. You almost certainly want to set this to 1 if you maintain a package build system or similar and are trying to use a modern container system and not plain chroot().

So relying on undefined behavior is not ideal. There might be use-cases for SYSTEMD_OFFLINE in future;
this change might be an issue.

+      *

+      * For more information on the variable, see:

+      * https://github.com/systemd/systemd/pull/7631

+      */

+     const char *systemd_offline = getenv ("SYSTEMD_OFFLINE");

Is there another variable created by rpm-ostree which says that /var/ is read-only?

Slightly off-topic How does installation of packages works with read-only /var/. Packages usually creates directory structure there.

+     if (systemd_offline && strcmp (systemd_offline, "1") == 0)

+       return 0;

+ 

      ret = init_context(argc, argv, &tctx);

      if (ret == ENOENT) {

          /* nothing to invalidate; no reason to fail */

Today running rpm-ostree compose tree results in a big spam
of warnings like:

⠙ Running pre scripts... openssh
openssh.prein: (Fri Feb 15 15:50:41:748148 2019) [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
openssh.prein: Could not open available domains
openssh.prein: groupadd.rpmostreesave: sss_cache exited with status 5
openssh.prein: groupadd.rpmostreesave: Failed to flush the sssd cache.
openssh.prein: (Fri Feb 15 15:50:41:774909 2019) [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
openssh.prein: Could not open available domains
openssh.prein: groupadd.rpmostreesave: sss_cache exited with status 5

This is because rpm-ostree doesn't want scripts writing into /var;
it's system-administrator managed state.

Really, SSSD should probably be silently ignoring system users.

But let's just silently do nothing if we're running offline, as
there won't be a sssd running.

rebased onto 0d31527

5 years ago

(Haven't tested this yet though, putting up for comments)

I think this patch is OK, thank you for the contribution.

Since the error is coming from shadow-utils, another suggestion (that IIRC the shadow-utils maintainer had..) was just to ignore ALL errors from sss_cache from the shadow-utils invocation.

Ignoring system users might not help because sssd mirrors the whole content of /etc/{passwd,group}, regardless of what kind of user it is.

But the issue is not with not-running sssd but with read-only filesystem for /var/lib/sss/db

The documentation for systemd does not say anything about read-only /var
https://systemd.io/ENVIRONMENT

$SYSTEMD_OFFLINE=[0|1] — if set to 1, then systemctl will refrain from talking to PID 1; this has the same effect as the historical detection of chroot(). Setting this variable to 0 instead has a similar effect as SYSTEMD_IGNORE_CHROOT=1; i.e. tools will try to communicate with PID 1 even if a chroot() environment is detected. You almost certainly want to set this to 1 if you maintain a package build system or similar and are trying to use a modern container system and not plain chroot().

So relying on undefined behavior is not ideal. There might be use-cases for SYSTEMD_OFFLINE in future;
this change might be an issue.

Is there another variable created by rpm-ostree which says that /var/ is read-only?

Slightly off-topic How does installation of packages works with read-only /var/. Packages usually creates directory structure there.

Slightly off-topic How does installation of packages works with read-only /var/. Packages usually creates directory structure there.

rpm-ostree translates subdirectories of /var to systemd-tmpfiles. See https://github.com/projectatomic/rpm-ostree/blob/5c69bcb4feee50b30d3f3eaf9e7feb6acae47d8a/src/libpriv/rpmostree-importer.c#L669

But the issue is not with not-running sssd but with read-only filesystem for /var/lib/sss/db

Well...it's both right? Why should sss_cache be doing anything if sssd isn't running?

But if you prefer I can rework this to test for writability of /var.

Reminded of this issue every time I run a compose. :(

But if you prefer I can rework this to test for writability of /var.

Maybe let's just check both? I.e. that /var is r/o or that SYSTEMD_OFFLINE=1.

What's the status here?
I have an open BZ https://bugzilla.redhat.com/show_bug.cgi?id=1645118

because the issue essentially blocks me to maniuplate with passwords when in chroot.
instead of using 'passwd' I need to paste the ugly hash right into /etc/shadow, replacing the original entry.

I'm sorry this took long. We mostly use github for pull requests and the PRs on pagure are less visible, so they sometimes are left behind (arguably we should disallow pagure PRs or move to github fully, but..let's not discuss this here..)

Thank you very much for the patch. The commit and one follow-up was pushed to sssd master as 073b03a
and 9f9d7ec

Pull-Request has been closed by jhrozek

5 years ago

Hmm, I think there's still some remnants of this. Composing FCOS today, which currently ships sssd-2.2.2-3.fc31.x86_64 still has these spams:

?[2K?[1B?[1A��� Running pre scripts... systemd
systemd.prein: (Fri Dec 13 18:52:47:212932 2019) [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
systemd.prein: Could not open available domains
systemd.prein: groupadd: sss_cache exited with status 5
systemd.prein: groupadd: Failed to flush the sssd cache.
systemd.prein: (Fri Dec 13 18:52:47:265692 2019) [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]
systemd.prein: Could not open available domains
systemd.prein: groupadd: sss_cache exited with status 5
systemd.prein: groupadd: Failed to flush the sssd cache.

Haven't looked very deeply though.

Hi @jlebon,

thanks for making us aware of this. Do you know which of the conditions, SYSTEMD_OFFLINE=1 or /var not writeable is met in your case?

@thalman, about 073b03a, can you check if the extended if-clause is really doing what was expected? If think the intention was to skip if one of the conditions is met (or) and not if only both are met (and).

bye,
Sumit

This issue increased in urgency for us because it's not just cosmetic, if sssd is enabled on the host, and RPM scripts aren't run in a container then sssd can cause uid/gids to "leak" from the host:
https://github.com/coreos/rpm-ostree/issues/2126

And yes I think that condition should be an or, not an and.

This issue increased in urgency for us because it's not just cosmetic, if sssd is enabled on the host, and RPM scripts aren't run in a container then sssd can cause uid/gids to "leak" from the host:
https://github.com/coreos/rpm-ostree/issues/2126
And yes I think that condition should be an or, not an and.

@walters is already in sssd >= 2.2.0

We're seeing this again in Fedora 34 (sssd-client-2.4.2-3.fc34):

systemd.prein: (2021-04-30 19:24:05:006073): [sss_cache] [ldb] (0x0020): Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory                                                                                          
systemd.prein: (2021-04-30 19:24:05:006156): [sss_cache] [ldb] (0x0020): Failed to connect to '/var/lib/sss/db/config.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/db/config.ldb': No such file or directory
systemd.prein: (2021-04-30 19:24:05:006179): [sss_cache] [confdb_init] (0x0010): Unable to open config database [/var/lib/sss/db/config.ldb]                                                                                                 
systemd.prein: (2021-04-30 19:24:05:006258): [sss_cache] [init_domains] (0x0020): Could not initialize connection to the confdb                                                                                                              
systemd.prein: Could not open available domains            
systemd.prein: (2021-04-30 19:24:05:006276): [sss_cache] [init_context] (0x0040): Initialization of sysdb connections failed                                                                                                                 
systemd.prein: (2021-04-30 19:24:05:006290): [sss_cache] [main] (0x0020): Error initializing context for the application                                                                                                                     
systemd.prein: groupadd: sss_cache exited with status 5                                                               
systemd.prein: groupadd: Failed to flush the sssd cache.

Hi,

We're seeing this again in Fedora 34 (sssd-client-2.4.2-3.fc34):

While partially this is https://github.com/SSSD/sssd/issues/5488
and (in this part) should be fixed via https://github.com/SSSD/sssd/commit/0cddb67128edc86be4163489e29eaa3c4e123b7b

above fix won't affect

"[sss_cache] [confdb_init] (0x0010): Unable to open config database..."

The sssd NSS plugin should basically do the stat("/run/systemd") and do absolutely nothing if it isn't present.

I guess, failing that we can change rpm-ostree to remove sssd from /etc/nsswitch.conf during builds.

The sssd NSS plugin should basically do the stat("/run/systemd") and do absolutely nothing if it isn't present.

SSSD can be used on a machine that doesn't use systemd at all.

Metadata