#2853 Reduce default service timeout to 15s
Closed: Insufficient data 2 years ago by kevin. Opened 2 years ago by aday.

The Workstation Working Group has a longstanding issue with services delaying shutdown for 2 minutes, and this issue continues to be extremely frustrating for our users:

https://pagure.io/fedora-workstation/issue/163

We have therefore been seeking to reduce the shutdown delay to 20 seconds. Initial attempts to make this change upstream stalled:

https://github.com/systemd/systemd/pull/18386

As a result, we're proposing a downstream configuration change:

https://src.fedoraproject.org/rpms/systemd/pull-request/85

This change was proposed a week ago and there hasn't been a response, despite our attempts to raise the relevant maintainers. We'd like this change to be implemented for Fedora 37: can FESCo help us to make this happen?

@catanzaro


I believe this should go through the change process and hence is too late for Fedora 37. But I don't have a strong opinion and would probably accept a very very late change for Fedora 37. However, I would not support doing this without loud communication about it.

Metadata Update from @churchyard:
- Issue set to the milestone: Fedora Linux 37

2 years ago

I'd rather see the services that are broken getting actually fixed than just killing them when things take too long - and that doesn't not even cover cases where things might have legitimate reasons for blocking shutdown (PackageKit is a repeat offender here, and just killing it while it's still running seems like a good way to hose your system).

But even if we need to kill services for some reason or other after a timeout (however long that timeout would be), wouldn't it be a much better experience either way to have plymouth display a message in such cases? Something like "Shutdown is taking longer than expected, please don't power off the system." (this is also what Windows does when it's doing maintenance work during shutdown).

I'd rather see the services that are broken getting actually fixed than just killing them when things take too long

The problem here is that it's very difficult to debug what's going on. PackageKit seems to be the worst offender. Despite our best efforts, we haven't been able to track down the source issue. Plus that component isn't actively maintained and is planned to be retired.

and that doesn't not even cover cases where things might have legitimate reasons for blocking shutdown

What would you call a legitimate reason to block shutdown? For workstation we strongly feel that the system ought to behave as instructed. There are cases like performing a system update at shutdown, but that is presented to the user as an option, not something that happens outside of their control.

Workstation targets portable devices. People have to run from their desks to catch trains, etc! It's not acceptable for us to unexpectedly not shut down as normal.

The problem here is that it's very difficult to debug what's going on. PackageKit seems to be the worst offender. Despite our best efforts, we haven't been able to track down the source issue. Plus that component isn't actively maintained and is planned to be retired.

As far as I can tell, this happens when PackageKit refreshes metadata in the background and this takes longer than expected (for example, on poor internet connection or when hitting a slow mirror).

What would you call a legitimate reason to block shutdown?

We have multiple systemd services that do maintenance tasks, mostly on a schedule - the fstrim timer, mlocate db updates, manpage db updates, etc ... depending on the timing of the "shut down now!" request, those services might block shutdown for a minute or two, and I'm not sure whether it's a good idea to kill those, especially if they're handling potentially cancel-sentitive tasks (like fstrim).

It's not acceptable for us to unexpectedly not shut down as normal.

I don't see a way to solve this without fundamentally changing the way those background / scheduled services run. For example, fstrim timer runs on first boot after Sunday IIRC - and if the user remembers they need to run to the train just after booting their system on a Monday morning, that's bad timing, but cancelling fstrim task just to "shut down as instructed" probably not a good idea, either.

On Wed, Aug 17, 2022 at 8:40 AM Fabio Valentini pagure@pagure.io wrote:

decathorpe added a new comment to an issue you are following:
``

The problem here is that it's very difficult to debug what's going on. =
PackageKit seems to be the worst offender. Despite our best efforts, we hav=
en't been able to track down the source issue. Plus that component isn't ac=
tively maintained and is planned to be retired.

As far as I can tell, this happens when PackageKit refreshes metadata in =
the background and this takes longer than expected (for example, on poor in=
ternet connection or when hitting a slow mirror).

It also happens because of a design flaw in how DNF currently handles
metadata fetching: it has to spin up GPG agents for temporary keyrings
so that it can support GPG signature verification for metadata. The
agents hang and can't be closed properly from librepo's side, so
PackageKit hangs on shutting down the backend. There's no way to fix
that, since the gpgme library makes it impossible to manage the gpg
agent process spawned by the library.

As an aside, PackageKit is maintained, there's just not a lot that can
be done right now to fix the problem as long as we can't use a unified
keyring and in-process GPG verification.

--
=E7=9C=9F=E5=AE=9F=E3=81=AF=E3=81=84=E3=81=A4=E3=82=82=E4=B8=80=E3=81=A4=EF=
=BC=81/ Always, there's only one truth!

As a process matter, I agree that this should be an F38 Change proposal. Ideally, it would include a list of services known to be frequent offenders in the hopes that we could try to rally some fixes to those cases.

@aday You have a good idea here, but F37 is too far along for the change to go in now. Could you submit a change proposal for F38? There's a wiki page with links explaining the process.

where things might have legitimate reasons for blocking shutdown

Things that have a legitimate reason to block shutdown can and do already inhibit shutdown, using systemd and gnome-session apis that exist for that purpose.

If PackageKit does not inhibit shutdown, it should be ruthlessly killed.

Assuming this issue doesn't magically resolve itself for F37, we'll target F38 and put in a change proposal.

Thanks. Will close this one for now in favor of a f38 change.

Metadata Update from @kevin:
- Issue close_status updated to: Insufficient data
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata