#1600 F25 System Wide Change: KillUserProcesses=yes by default
Closed None Opened 3 years ago by jkurik.

For the 2016-07-15 meeting as the Change Proposal was announced on devel-announce@ list on 2016-07-07.

Set the default policy to terminate processes in session scope when the user logs out. Specifically, systemd-logind's KillUserProcesses setting, which currently is set to "no" to override the upstream default, will be removed to follow the upstream default of "yes".


Please don't do this. Or at least do not do this for Fedora Server, it has no place there.

OK, so this is effectively the other half of ticket #1580.

As the Change Proposal stands today, I wouldn't be willing to approve it. So let's figure out what it would take to get there:

  • I want to see a crowdsourced list of all the packages known to be negatively affected by this change. (This should be possible by investigating the various email threads on the issue).
  • The list of affected packages should be divided into two categories ''by FESCo'':
  • Tier 1 packages must be ported to support operation under KillUserProcesses=yes by the Contingency Deadline. If any of the Tier 1 packages are still not migrated by that point, we return to the KillUserProcesses=no default until Fedora 26.
  • Tier 2 packages are non-blocking for this effort. They will be tracked as bugs (probably also listed on the F25 Common Bugs page), but a failure to meet this deadline is not sufficient to prevent us from switching.
  • I'd like to see the Contingency Deadline set at Beta Freeze.
  • I would like to see a guideline written up on the wiki and added to the Change Proposal on some of the ways that the porting could be done (such as converting tools to services, using the allow-linger functionality, etc.)
  • I'd like the package maintainers of any package FESCo declares Tier 1 to be added to the Change Proposal as an owner.

I'm in favor of what sgallagh is proposing.

The change does mention screen, tmux, and nohup, although the wording leaves open what might happen if that isn't done in time. I'm in favor of delaying the change until these and any other possible "tier 1" packages are done.

Replying to [comment:4 sgallagh]:

  • I want to see a crowdsourced list of all the packages known to be negatively affected by this change. (This should be possible by investigating the various email threads on the issue).

Such a list would open-ended, since it includes any program which ignores
SIGHUP, on purpose or not. I guess you mean a list of packages which we
would want to modify to "auto-exempt" themselves.

I don't think this list is going to be terribly large, i.e. tmux, screen.
dnf, packagekit, rpm-ostree were also mentioned, but packagekit uses a daemon,
as does rpm-ostree. That leaves dnf.

But I don't think it's a good idea to put any program that you might
want ever to use in the background on the list. There's always systemd-run
and friends to do such things. This would make the division between
tier 1 and tier 2 below moot.

  • The list of affected packages should be divided into two categories ''by FESCo'':
  • Tier 1 packages must be ported to support operation under KillUserProcesses=yes by the Contingency Deadline. If any of the Tier 1 packages are still not migrated by that point, we return to the KillUserProcesses=no default until Fedora 26.
  • Tier 2 packages are non-blocking for this effort. They will be tracked as bugs (probably also listed on the F25 Common Bugs page), but a failure to meet this deadline is not sufficient to prevent us from switching.

  • I'd like to see the Contingency Deadline set at Beta Freeze.
    Sure, that's reasonable. I amended the Change page to this effect.

  • I would like to see a guideline written up on the wiki and added to the Change Proposal on some of the ways that the porting could be done (such as converting tools to services, using the allow-linger functionality, etc.)
    Ack.

  • I'd like the package maintainers of any package FESCo declares Tier 1 to be added to the Change Proposal as an owner.

Nothing against that, I certainly would welcome more people on the ticket.

Replying to [comment:4 sgallagh]:

OK, so this is effectively the other half of ticket #1580.

As the Change Proposal stands today, I wouldn't be willing to approve it. So let's figure out what it would take to get there:

  • I want to see a crowdsourced list of all the packages known to be negatively affected by this change. (This should be possible by investigating the various email threads on the issue).
  • The list of affected packages should be divided into two categories ''by FESCo'':
  • Tier 1 packages must be ported to support operation under KillUserProcesses=yes by the Contingency Deadline. If any of the Tier 1 packages are still not migrated by that point, we return to the KillUserProcesses=no default until Fedora 26.
  • Tier 2 packages are non-blocking for this effort. They will be tracked as bugs (probably also listed on the F25 Common Bugs page), but a failure to meet this deadline is not sufficient to prevent us from switching.
  • I'd like to see the Contingency Deadline set at Beta Freeze.
  • I would like to see a guideline written up on the wiki and added to the Change Proposal on some of the ways that the porting could be done (such as converting tools to services, using the allow-linger functionality, etc.)
  • I'd like the package maintainers of any package FESCo declares Tier 1 to be added to the Change Proposal as an owner.

+1 to this Proposal / Course of Action.

From today's FESCo meeting:

agreed FESCo approves KillUserProcess=yes by default with the steps sgallagh has proposed in the ticket (+7, 0, 0)

Removing the meeting keyword, but leaving the ticket open to track the Tier1 stuffs.

During the meeting, I think we had a general consensus that the screen and tmux programs were definitely in Tier 1. Some initial investigation was done on dnf, but it looks like that package exits on session end regardless of the KillUserProcesses setting, so as it's no worse than the current state, I nominate that for Tier 2.

There are open questions on the nohup command as well as the disown shell built-in.

Replying to [comment:6 zbyszek]:

I don't think this list is going to be terribly large, i.e. tmux, screen.
dnf, packagekit, rpm-ostree were also mentioned, but packagekit uses a daemon,
as does rpm-ostree. That leaves dnf.

Some BTRFS tool and a Screen clone named Neercs were also mentioned in the first email thread, and in the current thread Nico Kadel-Garcia mentioned NX and an unnamed Perl script. And that's just what I happen to remember.

One thing we forgot to discuss last week was a deadline on providing the Tier 1 list. I'm going to suggest that we have this week to gather the list and make a decision on Tier 1 no later than the meeting on July 29th (chosen so that we have the option to use available hack-space at Flock to work on some of this if the relevant parties are present).

At the very least, I think there's no question that screen and tmux are Tier 1. I'd also like to assert that we will not block on any package not currently shipping in the Fedora Collection (such as Neercs). The btrfs example cited looks to me like something people would generally run under screen or tmux to handle the reparenting, rather than doing it themselves.

I think we need to explore nohup and the bash disown built-in next.

If we want, I can also send a message to devel-announce asking for our users to provide a list of tools that are expected to behave like screen that they know of (making it clear that we don't want a list of tools that they prefer to run ''under'' screen)

The nohup and disown should clearly be part of Tier 1 list. The intention to run the command regardless of the shell it was spawned from is clearly articulated with using nohup or disown - there is no discussion about that.

And I still think it is clearly wrong to enable this "feature" for ssh sessions on server installations although its only meaningful purpose can be found for GUI sessions on workstations.

Replying to [comment:14 tmraz]:

And I still think it is clearly wrong to enable this "feature" for ssh sessions on server installations although its only meaningful purpose can be found for GUI sessions on workstations.

As a former sysadmin, I disagree with this from experience. This is precisely the desired behavior on many multi-user shell servers, and is at least reasonable behavior on individual servers as well, where random tasks from login sessions ''shouldn't'' be long running. It's definitely desired on compute nodes, where junior sysadmins spend more time than should be necessary cleaning up after processes that have escaped their batch scheduler.

That is to say, there certainly exists non-GUI, non-workstation use for this. I'm ambivalent on whether that means the default should be switched for server, but I have the sense that this is a big enough change that it would be best if the default were the same across all of Fedora. If the Server WG feels strongly otherwise, okay (but I kind of suspect they won't, at least with the approved resolution here).

I suggest that the release notes shall contain two lists of affected programs:

  • First, all packages that have been adapted shall be listed to let users of those programs know that they're safe if they use the Fedora package, but if they install the program in some other way, then it may break. This will be the tier 1 packages plus any tier 2 packages that have also been adapted.
  • Second, all known affected programs that haven't been ported shall be listed, whether they are packaged in Fedora or not, to warn users of those programs that they need to disable KillUserProcesses or use some other workaround. This will be any tier 2 packages that haven't been adapted, and also Neercs and other unpackaged programs that it's reasonably likely that someone is using on Fedora.

Replying to [comment:12 sgallagh]:

The btrfs example cited looks to me like something people would generally run under screen or tmux to handle the reparenting, rather than doing it themselves.

According to Chris Murphy the btrfs command backgrounds itself and survives logout when KillUserProcesses is off, but it gets interrupted when KillUserProcesses is on. There is no screen or nohup in his example, only sudo:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/MAVJU5DMAQGU2NBN3Q3PVWMORNMUHPOI/

And I stumbled on two more candidates: X2go and a certain usage of the SSH Agent:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/HBZTOLXWJDFQ4GKT35FTRRYMSELYOJRC/

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/F3V4GJ4PO25PWAM45EGZYLLN7KJSQLDS/

OK, since no one else is apparently going to chime in here, I'm going to make a concrete proposal that we can accept or debate:

== Proposal ==
=== Tier 1 ===
* screen
* tmux
* nohup
* disown

=== Tier 2 ===
* btrfs tools
* btrfs is not considered production-ready in Fedora, those using it are assumed to be sufficiently technically competent that they can choose to disable this feature.
* x2go
* Given Fedora's march to Wayland, maintaining this uncommon legacy software is not a blocker.
* ssh-agent
* Looking into this, it looks like a very specific usage of ssh wherein credentials are delegated to the target machine. This is an insecure usage and should be discouraged and I have no problems asserting that this is an advanced usage eligible for manually disabling this feature.

In todays meeting:

agreed accept sgallagh's proposal from https://fedorahosted.org/fesco/ticket/1600#comment:17 (+7,0,0)

I think the feature is premature considering it doesn't completely fix the problem it's intended to solve, while excessively burdening a still as yet unknown number of programs with needing systemd specific changes. And worse, it causes the entire system, GUI, console access, everything, to hang for those 90 seconds. Only remote access gives me any indication what's going on. Even the mouse arrow freezes for 90 seconds upon initiating restart or shutdown.

KillUserProcesses true does not kill user gdm session on restart, restart hangs 1m30s
https://bugzilla.redhat.com/show_bug.cgi?id=1341837

"btrfs is not considered production-ready in Fedora, those using it are assumed to be sufficiently technically competent that they can choose to disable this feature."

There are numerous logical fallacies that make this a ridiculous conclusion: 1. there is zero information in the journal leading the user toward systemd at all, let alone KillUserProcesses feature, and 2. there is no possible way to get information into the journal (e.g. systemd.log_level=debug) in order to get a hint at why the background process is killed, and 3. the excuse of production readiness of anything, let alone Btrfs, in contrast to a feature that itself isn't even working as designed and can't be considered to be production ready either, is weird circular logic. 4. That because something is not production ready means the user has some sort of clairvoyance to avoid othogonal conflicts means every Fedora user using Fedora 25 will be assumed to be sufficiently technicalyl competent to disable this feature as well whenever they run into any problem that the logs do not at all indicate is related to the problem they're having.

I think the logic for approving the feature is far more flaky than the feature itself. The feature itself is at least half-baked, in practice it works as designed for one of three user session exit functions (logout yes, restart no, shutdown no).

Let's postpone this until F26.

There have been a bunch of other systemd bugs and I didn't have time to work on this properly.

Ping. F26 is due pretty soon, what is the status of this feature?
We (xpra) got bitten by this bug:
https://github.com/systemd/systemd/issues/3388
"systemd-run --user --scope ... doesn't work with unified cgroup hierarchy"

So we ended up integrating xpra with logind via our system wide proxy server, as per:
https://lists.freedesktop.org/archives/systemd-devel/2016-November/037700.html

But I doubt that screen or tmux will want to do that much work...

In Fedora-Workstation-Live-x86_64-26_Beta-1.4.iso, KillUserProcesses=no, so it seems postponed until F27 at the earliest.

Login to comment on this ticket.

Metadata