#1918 Proper Conditionalization of default systemd processes
Closed: Fixed 4 years ago Opened 4 years ago by gbcox.

There are three processes I am aware of that are failing and causing systemd to report it is running in a degraded state:

dbxtool.service loaded failed failed Secure Boot DBX (blacklist) updater
mcelog.service loaded failed failed Machine Check Exception Logging Daemon
rngd.service loaded failed failed Hardware RNG Entropy Gatherer Daemon

The mcelog.service message is associated with rhbz#1166978
The dbxtool.service message is associated with rhbz#1508808
The rngd.service message is associated with rhbz#1490632

These processes fail because they assume the target system has the required hardware. If it doesn't they report the system is degraded. Instead, these processes should validate the proper hardware exists, if it does not, they should exit without causing systemd to report the system
is running degraded.

We should have a standard which states that before a systemd process is made default for everybody, it should be properly conditionalized. There may be more processes which have this issue, but these are the three I found which were affecting my system.


Isn't this covered by https://fedoraproject.org/wiki/Packaging:Systemd#Hardware_activation already? If not, can you work with the FPC to come up with a proposal?

Isn't this covered by https://fedoraproject.org/wiki/Packaging:Systemd#Hardware_activation already? If not, can you work with the FPC to come up with a proposal?

It's not directly covered there, but I agree it appears that is probably the place for it. I'll work with the FPC on this. This issue for fesco can probably be closed out as FPC is probably the more appropriate venue. Thanks Josh.

Metadata Update from @till:
- Issue close_status updated to: Duplicate

4 years ago

I reread the rngd.service bug again (rhbz#1490632), and the other two bugs, and I think FESCo should weigh in. There are various guidelines which should cover this case, but obviously they are not clear enough, and the discussion (and implementation!) has been going round in circles since F26 at least. I don't think this FPC material, because it's not about packaging, but about a disagreement about proper behaviour of the default installation. I think FESCo should at least establish the high-level guideline before handing this back to FPC.

The root of the problem is a disagreement what should happen with a service for hardware which might or might not be present. The first question is whether such a service should be enabled by default at all. The second question is how the service should behave when enabled, and in particular if it should fail if the hardware is non-existent.

I'd like FESCo to approve something like this:
"""
For all three cases under discussion (rngd.service, mcelog.service, dbxtool.service), those services should remain to be enabled by default through presets, to allow users to derive maximum benefit from their hardware. If the hardware is not available, the service should emit an appropriate information to the log, and exit cleanly, so that lack of hardware is not reported as an error. This may be implemented through systemd conditionals [1] or similar or internally in the service executable. If desired, an opt-in mechanism can be provided to make the service fail instead. The same "opportunistic" policy applies to other cases of services which enhance or make use of hardware without need for user configuration or interaction.

[1] https://www.freedesktop.org/software/systemd/man/systemd.unit.html#ConditionArchitecture=
"""

Metadata Update from @zbyszek:
- Issue status updated to: Open (was: Closed)

4 years ago

I generally agree with @zbyszek here. I think I might rephrase the proposal somewhat differently, though. I'm also adding the meeting keyword so we can discuss this at Monday's meeting.

Proposal (to be used as the requirements spec for creating packaging guidelines):

  • Services that meet all of the other requirements for starting by default are permitted to be enabled in systemd presets.
  • Services that are only useful on a subset of hardware must exit gracefully and without marking the service as "failed" (from systemd's perspective) if that hardware is not present. This may be accomplished through systemd's conditionals, by having the service itself (or a wrapper script around it) exit with a zero return code, or similar functionality.
  • If there is value in the service optionally failing (such as if hardware that is expected to be present disappears or stops functioning), this may be made available using one of the following methods:

    • As an opt-in mechanism involving a userspace tool with proper documentation
    • The service may store state on the disk (in an appropriate location in /var) to use as a baseline for detecting the failure.

I think this covers the existing cases as well and holds that the bugs filed against them are valid and must be resolved in keeping with the rules set forth.

Metadata Update from @sgallagh:
- Issue tagged with: meeting

4 years ago

Since FESCo has decided to discuss this I will hold off on submitting a revision to the Default Services guideline to ensure it conforms to whatever requirements spec is approved. Or if you folks want to revise the guidelines yourself, that is fine also.

I agree with what @sgallagh has written - my only comment is that I think it would be advantageous if a particular systemd conditional doesn't currently exist, that it be requested. For example, Lennart recently created ConditionSecurity=uefi-secureboot as a direct result of the email thread. I identified that the RDRAND instruction is required for the random number generator in both AMD and Intel processors and can be obtained via CPUID. mcelog capability can be determined with the mcelog --is-cpu-supported,

It may be that the systemd team for whatever reason decides creation of another condition is not possible or justified - and that is fine... but they can't make that determination unless they are asked.

To make sure my thoughts on the mailing lost end up with the rest of the discussion, I'll note that in https://lists.fedoraproject.org/archives/list/packaging@lists.fedoraproject.org/message/TLKJPA4EBH7KRAJRF7BK2XBXLOF22XB2/ I made some suggestions about how the packaging guidelines would handle this.

To summarize, modify https://fedoraproject.org/wiki/Packaging:DefaultServices to require that services which are enabled by default must not in the course of normal operation fail to start in such a way that system state is anything other than 'running'.

Additionally, we could note that services which are dependent on the presence of specific hardware should perhaps be hardware activated if possible. But I have no idea if it is possible to hardware activate something like rngd or mcelog. If it is reasonably possible then that seems like a far better option than what we have now.

@gbcox:

it would be advantageous if a particular systemd conditional doesn't currently exist, that it be requested.

Systemd conditionals make sense where there's a clear and general condition that can be checked. "uefi-secureboot" is such a condition, and it's nice that Lennart quickly added it. But in general this wouldn't be the case. For example, see the rngd.service case. The service supports a few different mechanisms of obtaining random numbers, and expressing this as systemd conditionals would be at best messy. And the service executable already looks for any supported random sources and checks if any of them are usable. Duplicating this functionality in the unit file is just more work.

I don't think we should encourage asking for a systemd conditional, because:
- the answer would usually be "no"
- such a conditional would only apply to rawhide and later, so it doesn't work as a packaging solution

@tibbs:

modify https://fedoraproject.org/wiki/Packaging:DefaultServices to require that services which are enabled by default must not in the course of normal operation fail to start in such a way that system state is anything other than 'running'.

That page already contains the phrase "service does not require manual configuration to be functional" in various places, which implies that there is no "failure". I think @sgallagh's wording in https://pagure.io/fesco/issue/1918#comment-518092 is better, because it covers this case explicitly.

we could note that services which are dependent on the presence of specific hardware should perhaps be hardware activated if possible

Yes, as long as it is understood that automatic support is strongly encouraged, and this is understood to be just a suggestion as to the implementation mechanism.

I think it is important that Fedora makes the best of available hardware as much as possible. The other solution, where the service is disabled by default and requires opt-in, is IMHO a 90's style solution that we should avoid as long as it is possible to support hardware through auto-detection. So I'd like to see some sentence in the guidelines that encourages such automatic support.

  • If certain hardware can be used without configuration, services which provide this support should be enabled by default.
  • Those services must meet the other requirements for starting by default.
  • Services that are only useful on a subset of hardware must exit gracefully and without marking the service as "failed" (from systemd's perspective) if that hardware is not present. This may be accomplished through systemd's conditionals, by having the service itself (or a wrapper script around it) exit with a zero return code, or similar functionality.
  • If there is value in the service optionally failing (such as if hardware that is expected to be present disappears or stops functioning), this may be made available using an opt-in userspace mechanism or by storing information about detected hardware on disk.

Amended Proposal from FESCo Meeting:

  • Services that meet all of the other requirements for starting by default are permitted to be enabled in systemd presets.
  • If certain hardware can be used without configuration, services which provide this support for this should be enabled by default.
  • Services that are only useful on a subset of hardware must exit gracefully and without marking the service as "failed" (from systemd's perspective) if that hardware is not present. This may be accomplished through systemd's conditionals, by having the service itself (or a wrapper script around it) exit with a zero return code, or similar functionality.
  • If there is value in the service optionally failing (such as if hardware that is expected to be present disappears or stops functioning), this may be made available using one or more of the following methods:

    • As an opt-in mechanism involving a userspace tool with proper documentation
    • The service may store state on the disk (in an appropriate location in /var) to use as a baseline for detecting the failure.

I think this covers the existing cases as well and holds that the bugs filed against them are valid and must be resolved in keeping with the rules set forth.

AGREED: FESCo will submit the proposal in https://pagure.io/fesco/issue/1918#comment-518281 to the FPC (+6, 0, -0) (sgallagh, 15:33:08)

Metadata Update from @sgallagh:
- Issue untagged with: meeting
- Issue close_status updated to: Fixed

4 years ago

Login to comment on this ticket.

Metadata