PR#1333: Scriptlets: recommend against writing to `/var` - packaging-committee

packaging-committee

#1333 Scriptlets: recommend against writing to `/var`

Opened 2 months ago by jlebon. Modified 21 days ago

jlebon/packaging-committee pr/var-scriptlet-guidance into master

Scriptlets: recommend against writing to `/var`

Jonathan Lebon • 2 months ago

fedd908

guidelines/modules/ROOT/pages/Scriptlets.adoc

file modified

+30

		`@@ -198,6 +198,36 @@`
		`Of course, in the above situation`
		`it is better to use RPM file triggers if at all possible.`

		`+ === Modifying /var`
		`+`
		+ Apart from `+/var/tmp+` and the aforementioned `+/var/lib/rpm-state/+`
		+ directories, RPM scriptlets SHOULD NOT try to read from or write to `+/var+`.
		`+`
		`+ ==== Rationale`
		`+`
		`+ Image-based compose and update systems, such as OSTree, Image Builder, and`
		`+ container images run RPM scriptlets at compose time, e.g. on a build server.`
		`+ This means that the scripts don't have the opportunity to directly affect system`
		+ state in `+/var+`.
		`+`
		`+ On an OSTree system for example, a strong emphasis is placed on upgrades being`
		`+ offline, which means that an upgrade simply cannot affect the system in any`
		`+ visible way until the user actually decides to reboot into the new update.`
		`+`
		`+ ==== Alternatives`
		`+`
		`+ The common motivation for wanting to modify system state is to perform some sort`
		`+ of migration, cleanup, or to initialize some state. The common ways to address`
		`+ these in an image-friendly manner are:`
		`+`
		+ 1. A `+tmpfiles.d+` dropin: to initialize state in `/var` at boot time.
		`+ 2. A systemd service: to run imperative code at boot time, ordered before the`
		`+ service that depends on said code having executed. For one-time migrations, you`
		+ likely want to use e.g. a stamp file combined with `+ConditionPathExists=+`.
		`+ 3. Integrating into the dependent service: have the migration or cleanup be`
		`+ carried out by the very service that depends on it. This is often the simplest`
		`+ on the packaging side, but likely requires the upstream to support it.`
		`+`
		`== Snippets`

		`Some scriptlets to use in specific situations.`

jlebon commented 2 months ago

In the rpm-ostree variants of Fedora, we've hit many times over the
years instances of packages wanting to perform migrations/cleanups as
part of e.g. their %post scripts. Such scripts do not work in those
variants because they run at compose time on the build server and not on
client systems (rpm-ostree also support layering RPMs on the client, but
by design the same limitations apply there).

As a result, either the migration never takes place on those variants,
or the scriptlet fails to run and breaks the compose (because rpm-ostree
runs scriptlets with a read-only /var).

We (the Fedora CoreOS Working Group) would like to submit a Change
proposal that will as a first pass recommend against scriptlets writing
to /var.[[1]] As prescribed in the Changes policy[[2]], this is a
draft PR with the suggested change for review before actually submitting
a proposal. (An eventual goal would be to forbid it completely. That
could be done as part of this Change or a follow-up Change based on
discussions with the community and FESCo.)

ngompa commented 2 months ago

This is unreasonable. Without /var and without /usr, we are left with almost nothing. We use /var for persistent local and shared state in scriptlets. You should reconsider making /var RO during composes.

Edited 2 months ago by ngompa

jlebon commented 2 months ago

Hi Neal, thanks for looking!

We use /var for persistent local and shared state in scriptlets.

This is accounted for in the proposal (see lines 7 and 8 of the diff).

walters commented 2 months ago

This looks good to me overall, but it could probably use some examples. https://bugzilla.redhat.com/show_bug.cgi?id=1657041 is one.

ngompa commented 2 months ago

Hi Neal, thanks for looking!

We use /var for persistent local and shared state in scriptlets.

This is accounted for in the proposal (see lines 7 and 8 of the diff).

I'm saying it's unreasonable to restrict it to just those locations. If an application has state that needs to be modified for any reason anywhere in /var, it needs to be permitted. For example, if display manager "state" needs to be modified on upgrade (to handle desktop session configuration transitions, for example), then it has to be possible. Fedora KDE did this in F34 and other desktops will do this over time for various reasons (mostly around Wayland transitions).

walters commented 2 months ago

If an application has state that needs to be modified for any reason anywhere in /var, it needs to be permitted.

Of course it's permitted, it's just done by executing code as part of the service/app, not via %post.

ngompa commented 2 months ago

If an application has state that needs to be modified for any reason anywhere in /var, it needs to be permitted.

Of course it's permitted, it's just done by executing code as part of the service/app, not via %post.

We do not have the ability to demand applications to change their architectures. So, no, this is not happening.

jlebon commented 2 months ago

We do not have the ability to demand applications to change their architectures.

The proposal mentions this: if the application/service itself cannot carry out the migration, then a systemd unit can be shipped to do so.

Speaking at a very general level, migration code run in a %post script could just as well run in a systemd unit. On traditional systems, the systemd unit would run when the service is restarted. On an image-based system, it would run on the next boot. Obviously, the best course of action depends on the particular case. The guideline can only present the issue and provide some common approaches to solve it.

jlebon commented 2 months ago

To provide more context on this: this proposal is essentially documenting the status quo, at least for any package currently shipping in any OSTree-based variant today. Scriptlets in all those packages must already account for these limitations. The goal of having it in the packaging guidelines is to raise awareness and in the process reduce the likelihood of discovering compatibility issues for packagers, composers, and end-users.

ngompa commented 2 months ago

You realize that by forcing systemd units, we are forcing people to have to go through a whole secondary process to deal with things like presets, right? That's a lot of extra steps for stuff like this. We do not allow as a matter policy people forcibly enabling systemd units in scripts unless specifically approved by FPC or FESCo as an exception.

That's why your solution is unacceptable and you need to rethink the limitations you're imposing on your system.

Edited 2 months ago by ngompa

walters commented 2 months ago

We do not allow as a matter policy people forcibly enabling systemd units in scripts unless specifically approved by FPC or FESCo as an exception.

There's a lot of irony though in having zero approval process for arbitrary code run as root on the build servers and client machines in %post.

AFAIK the rationale behind presets is mainly to control socket/network accessible services; I don't think we should have similar constraints for "operating system implementation details".

If it comes down to it I could imagine adding something to rpm which is like
%clientpost
which is explicitly defined to run on the end machine. But I dunno, it still seems like a bad pattern to encourage versus having transition code run as part of the existing codebase.

In the case you mentioned

For example, if display manager "state" needs to be modified on upgrade (to handle desktop session configuration transitions, for example),

Isn't there already a systemd unit in kdm.service or whatever that could contain the relevant code?

tibbs commented 2 months ago

Honestly, this committee's role in things is to say whether these guidelines look good assuming that FESCo approves the feature request. Whether we agree with the feature itself isn't really the point here, and I don't think this is the right place for that discussion. FPC doesn't get a veto if FESCo approves something.

jlebon commented 2 months ago

We do not allow as a matter policy people forcibly enabling systemd units in scripts unless specifically approved by FPC or FESCo as an exception.

I think exceptions should be allowed for migration cases since they are not long-running services and are purely in service of another system service (pun intended). Another way to model this actually is to put the migration service unit into the .requires/ of the target service. That's better even than presets since it only runs the migration code if the service was actually in use in the first place.

Edited 2 months ago by jlebon

walters commented 2 months ago

We do not have the ability to demand applications to change their architectures. So, no, this is not happening.

It's not about demanding. The role of this issue is leading the larger "we" in improving the FOSS ecosystem. See for example https://github.com/linux-nvme/nvme-stas/issues/130 where we successfully converted the nvme tooling to use a systemd unit instead of a package script.

salimma commented 2 months ago

We do not have the ability to demand applications to change their architectures.

The proposal mentions this: if the application/service itself cannot carry out the migration, then a systemd unit can be shipped to do so.

Speaking at a very general level, migration code run in a %post script could just as well run in a systemd unit. On traditional systems, the systemd unit would run when the service is restarted. On an image-based system, it would run on the next boot. Obviously, the best course of action depends on the particular case. The guideline can only present the issue and provide some common approaches to solve it.

Would this not be burdensome for applications that might not even be meant to run as a service, but do put files in /var ?

Also this seems to assume the runtime environment has systemd, which IIRC won't be true for containers.

jlebon commented 2 months ago

Also this seems to assume the runtime environment has systemd, which IIRC won't be true for containers.

Some containers do run systemd, but yeah most don't. But actually the larger issue with containers is that one usually does not update a long-living container. Instead it's nuked and replaced by a new built image (Flatpaks even more so). So %post scripts don't have a chance to run at all there to migrate data volumes mounted in.

In those cases, the service/app kinda already has to be able to deal with these upgrade situations seamlessly and not rely on distros handling it in scriptlets (or of course, they're free to e.g. request users to manually run some tool or to nuke the data volumes, etc... -- the main point is that this is already an issue today and not specific to OSTree).

ngompa commented 2 months ago

This is only true for OCI "app" containers, and not true for OCI "toolbox" containers. nor is it true for LXC or nspawn containers, nor is it true for VM workloads or regular user workloads.

This issue is specific to RPM-OSTree workloads, just like how RPM-OSTree still to this day cannot handle Lua scriptlets.

Edited 2 months ago by ngompa

jlebon commented 2 months ago

This is only true for OCI "app" containers, and not true for OCI "toolbox" containers.

It's not common for pet containers to run long-running services that manage persistent state.

Also, many people (including me) do in fact nuke and recreate pet containers routinely on top of newer images. Both toolbox and distrobox document this approach.

nor is it true for VM workloads or regular user workloads.

I suppose this assumes those systems don't do transactional updates, which brings us back to this ticket. :)

jlebon commented 2 months ago

Note openSUSE packaging guidelines have similar wording to what is being proposed here:
- https://en.opensuse.org/openSUSE:Packaging_guidelines#Migration_.2F_Upgrades
- https://en.opensuse.org/openSUSE:Packaging_for_transactional-updates#Files_in_/var

The MariaDB example there have an analogue in their upstream container docs: see the MARIADB_AUTO_UPGRADE variable to automatically run migration code on container startup: https://mariadb.com/kb/en/mariadb-server-docker-official-image-environment-variables/#mariadb_auto_upgrade-mariadb_disable_upgrade_backup.

Both of these are examples in transactional update systems that are not OSTree that must also deal with state in /var.

james commented 2 months ago

I'm in two minds on this.

FPC is generally more about saying what is best/common practice and having words that explain it to people who aren't familiar. In some cases "new" things happen and we have to provide guidance, in very rare cases things change and we have to tell people to stop doing certain things how they did (Eg. /usr move).

This doesn't feel like any of that.

On the other hand, I personally would love it if we moved closer to a world where %post didn't exist and I think long term that's where we should be preparing to go ... and I know I'm not alone in both thoughts.

Four thoughts come to mind, in rough order of joy:

As part of ostree/bifrost/whatever projects create a systemd "post upgrade running environment" service. Basically it would run things "immediately" after the rpm transaction in a traditional environment, but otherwise it would run at first boot in a container/ostree/etc. environment. With only a small amount of thought this service would have a way to drop a script into a directory somewhere, and that script/binary only runs once ... but also some mechanism to refresh (so it will run once again, even if the script didn't change). Having the migration from using %post to using this service should be as nice as possible.
Change the wording so that it only applies to changes within a release. Also maybe don't just outright ban reading from /var, make it obvious what reading will work and what will have problems.
"not really a solution": Change the wording so that if a package needed to do this then they'd had to mark the package in some way (Provides: upgrade-dirty(/var)). Again, it doesn't solve it, but it hopefully helps makes it easier to see where the problems are.
"political dumpster fire": Go argue with FESCO and the community that we need to do this.

jlebon commented 2 months ago

On the other hand, I personally would love it if we moved closer to a world where %post didn't exist and I think long term that's where we should be preparing to go ... and I know I'm not alone in both thoughts.

:heart:

As part of ostree/bifrost/whatever projects create a systemd "post upgrade running environment" service. Basically it would run things "immediately" after the rpm transaction in a traditional environment, but otherwise it would run at first boot in a container/ostree/etc. environment. With only a small amount of thought this service would have a way to drop a script into a directory somewhere, and that script/binary only runs once ... but also some mechanism to refresh (so it will run once again, even if the script didn't change). Having the migration from using %post to using this service should be as nice as possible.

This is similar to Colin's idea in https://pagure.io/packaging-committee/pull-request/1333#comment-198470 I think (i.e. trying to provide some abstraction to make it just work in both cases).

I think we could do that if it helps with adoption. OTOH, I do worry that the major difference in implementation details might add confusion (vs. if it's a bona fide systemd unit, then it's easier to explain: it gets run immediately on traditional systems, but on reboot in transactional systems). For services that could also be containerized, it would still be preferable if the service itself can take care of this sort of migration.

Is the next step for this now to create a Fedora 41 Change Proposal? We'll definitely want to make sure to capture some of the discussions we've had here.