#228 Enable full preemption
Opened a year ago by chrismurphy. Modified 5 days ago

Now that a single kernel can use preemption none/voluntary/full, we should evaluate the pros and cons of preempt=voluntary (the current default) versus preempt=full.

relevant message in devel@ thread:Dynamic preemption support in Linux 5.12 kernel.

This would be configured with a boot parameter, so it could be an edition/spin specific option. But we need to figure out how to add it in a configurable way. Maybe it can be done with a dracut drop-in config file?

There's other work happening with resource control, so part of the investigation should consider any conflicts, or other downsides.

@benzea


Metadata Update from @chrismurphy:
- Issue tagged with: pending-action

a year ago

I would prefer that if we decide to do this, we should make this a default for all variants unless someone specifically wants to opt-out. Pretty much everything I have read on this seems to indicate the downsides are relatively minimal across the board.

I believe that this issue can be considered entirely independent from other resource control problems.

Personally, I am wondering a bit how exactly preemption helps here. Is the problem here that spinning up another core is costly, so we should preempt kernel tasks in order to run userspace earlier? i.e.:

CPU0: [kernel input] -> [other kernel task] -> userspace input -> ...
vs. with preemption:
CPU0: [kernel input] -> userspace input -> [other kernel task] -> ...

I've asked Tejun Heo about this, and he's confirmed there shouldn't be any impact on resource control specifically (and if there is, it's a bug we should fix).

I expect for the desktop case, it is a win across the board. There are certain server workloads where it can be a problem. I am not particularly comfortable switching defaults in the middle of a stable release though, particularly when we have so much testing with preempt voluntary (RHEL uses voluntary as well, and does a lot more performance testing than we do). Hopefully we can get a patch which lets us keep the default and still use dynamic so that F33/F34 users will benefit.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

BTW, this is how to change preempt on-the fly:

# cat /sys/kernel/debug/sched_preempt
(none) voluntary full

# echo full > /sys/kernel/debug/sched_preempt

# cat /sys/kernel/debug/sched_preempt
none voluntary (full) 

Would be nice to make preemptctl.

Sounds like Workstation should switch to preempt for F35.

I agree with Justin that there's no strong reason to make major changes like this in a stable release.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

We do not install tuned in Workstation.

We do not install tuned in Workstation.

why not? It's easy to do this.

Nobody has ever proposed it afaik.

/sys/kernel/debug/sched_preempt

debugfs is subject to kernel lockdown, which is in effect for UEFI Secure Boot systems, so currently it can't be checked or changed during runtime.

[  531.180062] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

System tray applet is fully functional:

  • full D-Bus support;
  • profile monitoring (will report if another application changes the profile);
  • easy profile switching;
  • shows currently selected profile;
  • supports automatic mode.

Screenshots:

Screenshot

GUI version is still in development.

Packaged and can be installed via sudo dnf install tuned-switcher.

The workstation UI for such things is in the control-center

The workstation UI for such things is in the control-center

power-profiles-daemon devs said that it will not work on desktop Ryzen PC's for example. And it doesn't work not only on desktop Ryzens. Maybe just at this moment and this temporary, so this is JFYI.

This ticket is tagged with pending-action and the F35 milestone. What needs to be done?

Nothing yet, as the patch that allows us to switch preempt mode from where we are doesn't seem to exist yet. I pinged Lutro on the fedora-devel thread to see if it is in progress. Once that patch goes in (and the resulting kernel is available as a rebase target for stable Fedora), is the time to start testing switching the workstation defaults. Users on stable Fedora will be able to manually make the switch, and then you can make a decision on setting a default for the next version. As it doesn't seem to be in 5.14, it will likely not be on the install kernel for F35

Metadata Update from @aday:
- Issue set to the milestone: Fedora 36 (was: Fedora 35)

8 months ago

Summarize a chat with @jforbes:

  • Fedora kernels are and will continue to have voluntary preemption
  • CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window, and is needed to change to full preemption using kernel parameter preempt=full
  • We would need anaconda to add preempt=full to boot loader configuration, so that it's applied to new desktop installations
  • Discuss whether to run some desktop only RPM, that's just a scriptlet which has grubby set the boot parameter, so it's applied to upgrades
  • changing it via debugfs might be possible, but is untested, including whether it'll give the same results as the command line parameter

kernel crystal ball says:

the v5.16 kernel predictions: merge window closes on Sunday, 2021-11-14 and release on Sunday, 2022-01-09
the v5.17 kernel predictions: merge window closes on Sunday, 2022-01-23 and release on Sunday, 2022-03-20

Fedora 36 schedule key says:

Branch Fedora Linux 36 from Rawhide     Tue 2022-02-08

So it does seem plausible to get it done for Fedora 36, but it needs testing, and a decision about whether to enable it on new installations only or also on upgrades.

http://phb-crystal-ball.org/
https://fedorapeople.org/groups/schedule/f-36/f-36-key-tasks.html

Importantly, running with preempt=full at all needs some significant testing across desktop use cases as all Fedora and RHEL kernels have shipped with preempt=voluntary for a very long time. I honestly do not expect there to be issues, but we need to be sure. FWIW, I plan to switch over to 5.16-rc kernels for desktop use on at least one machine here to give preempt=full some testing as soon as I think things have stabilized enough (the 5.16 MR has been more painful than average so far). But I am probably not representative of the average Fedora workstation user as I still spend the majority of my time in terminals, and I don't have a btrfs filesystem anywhere here.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

Yes, but no. It did, in a way that I was not willing to enable it (Requiring preempt=full as default) There was along thread about it on fedora-devel. 5.16 is where the options came in that actually allow me to enable it for Fedora.

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

I am not too concerned about "many desktop distros" I know what Fedora and Red Hat testing looks like, what the rest of the world does is not of too much concern to me. RHEL in particular has a fairly substantial performance testing setup, and preempt=full was never tested there. Voluntary has been the default for everything but s390 for a very long time. I want to see real testing. Luckily F38 is still a ways away, so we have time to do some of that testing.

From pykickstart docs, looks like it would be

bootloader --append "preempt=full"

If sysfs or debugfs, it could be done with a /etc/udev/rules.d drop-in.

We discussed this ticket at yesterday's working group meeting. The group is generally positive about pursuing full preemption. @chrismurphy has kindly agreed to monitor the situation and keep us informed.

5.16.0-0.rc4.29.fc36.x86_64+debug

# cat /sys/kernel/debug/sched/preempt 
cat: /sys/kernel/debug/sched/preempt: Operation not permitted
# dmesg
...
[  189.486988] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

I think with kernel lockdown in place due to UEFI Secure Boot being enabled here, it's not possible to change this via sysfs, and it'll need to be done by kernel boot parameter which means we need to do this via anaconda kickstart.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Sample size 1, running it for a few days now along with bcc-tools fileslower and i'm not seeing anything out of the ordinary compared to voluntary. Of course, fileslower is only looking at IO latency at the VFS layer, so it's a very narrow view.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Well, from the Cloud side (cc: @davdunc), I can say we're interested in any change that might potentially benefit us. From the Server side, data is always interesting when trying to figure out how to make things better. Doing that lets us figure out whether there's value or difference in being different across variants on this point.

Personally, I prefer to minimize differences across Fedora variants if it's reasonably possible. So testing across all the workload types makes sense to see if it makes sense to maintain a delta between desktop and non-desktop variants.

Metadata Update from @chrismurphy:
- Issue set to the milestone: Fedora 37 (was: Fedora 36)

2 months ago

Working full preemption is available with preempt=full boot parameter in 5.16+ kernels. I've been using it since available, and I think it's ready for wider testing. How to do that?

Discussed at meeting, chrismurphy will pull a system wide change proposal together for F37. And coordinate with QA for a test day(s).

Just to be clear, this system wide change proposal is for workstation only.

No, we're planning to propose this change for all editions. Certainly there's no reason for Workstation to differ from other desktop editions. As for server editions, I'll let Neal or Chris comment.

There absolutely is, which is why I did not make preempt=full the default.

Some troubles with backlight on my laptop when preempt=full is setup. The brightness level has dropped noticeably. Although the slider is turned at 100%. I have OLED display and use ICC Brightness tool. https://github.com/udifuchs/icc-brightness.

preempt=full works great on my PC. Much better experience. Didn't noticed any issue at least yet.

I wonder could preempt=full affect battery life on laptops?

There absolutely is, which is why I did not make preempt=full the default.

There isn't sufficient proof to indicate we should not at least try across the board. I've discussed it with @davdunc, @dcavalca, and @salimma in Fedora Cloud/Server and we are at least open to trying it. The data we have says it will at worst be net-neutral.

FWIW, I've been running preempt=full on my F36 desktop for a while now, working and gaming without any issue.

Login to comment on this ticket.

Metadata