#2320 F32 System-Wide Change: Enable EarlyOOM
Closed: Accepted 2 months ago by decathorpe. Opened 3 months ago by bcotton.

Install earlyoom package, and enable it by default. If both RAM and swap go below 10% free, earlyoom issues SIGTERM to the process with the largest oom_score. If both RAM and swap go below 5% free, earlyoom issues SIGKILL to the process with the largest oom_score. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.


(Note that some discussion is still happening on devel list)

(Note that some discussion is still happening on devel list)

Some? :sweat_smile:

My opinion: I've read most of the discussion on the devel list, and I'm not so sure that enabling this by default is a good idea.

Less than 10% of both RAM and swap free seems both too high and too low a bar. For example, assume 16 or 32 GBs of memory, which means 16 or 32 GBs of swap, the way anaconda sets things up by default. So, 10% of 16 or 32 gigs of RAM is still a lot of free memory, but 15-30 gigs of swap usage is definitely bad. So I think these heuristics need to either be tweaked, or anaconda must stop asking for such big swap partitions.

Also, with both oomd / earlyoom / etc. still being actively developed and tweaked (and with an oom "killer" of some form or shape to be integrated into systemd within the next 1-2 years), I think it's safer to leave this as an optional feature until at least most of the issues have been worked out.

I'm ok with workstation working group enabling it, but yeah, it seems like things are moving around a lot in development here and we have lived without it for this long, so perhaps it's better to wait? But I wouldn't stop them from doing it if they would really like to. I'd keep it off in other install paths for now.

I would prefer to have a self-contained change introducing earlyoom to Fedora before having a system-wide change enabling it by default.

This would give it more publicity and some user base, especially from people struggling with memory limits. These are more likely to read release notes. But it wouldn't change anything for people who have never thought about this topic before and are not interested in understanding what and how oom does to their system.

I am mostly concerned that we need to somehow educate and give a heads-up for the user experiencing OOM event that it ends in responsive but inconsistent state of the system. That it needs some attention. And currently there is nothing in earlyoom service which solves it.

We can work on consistency topic, through Killmode and OOMPolicy-like options and cgroups scopes for desktop services and processes, but we are far from done there.

Workstation working group discussed this in today's meeting. Summary: WG supports earlyoom enabled by default for Fedora Workstation 32 (+1:4, 0:4, -1:1)
https://pagure.io/fedora-workstation/issue/119#comment-620909
https://meetbot.fedoraproject.org/fedora-meeting-2/2020-01-15/workstation.2020-01-15-10.49.html

Metadata Update from @dcantrel:
- Issue tagged with: meeting

3 months ago

I'll miss today's meeting unfortunately.

I have the feeling that it seems likely that we'll have a different solution — and significantly better solution — in relatively near future. Enabling earlyoom.service for just a short time (one or two Fedora releases) does not seem very attractive, in particular because it is a relatively heavy-handed solution that will distract from finding a better solution. But is not terrible, so I'm ±0 at this time. If people feel that they want to give earlyoom a try, I completely understand that.

I had earlyoom enabled on my system over last two weeks. It didn't kill anything :)

I am ok with letting Workstation group to decide for themselves. +0 from me.

bookwar makes good comments about introducing this to users more and making sure everyone understands it before making it a default. zbyszek also makes a good point that other similar solutions are in the works and might offer a better option than earlyoom will. Personally, I do not feel this is a good default for all systems but would prefer it to be present for users wishing to enable it (that is, what we do now). A service like earlyoom feels like a system papering over one or more real problems. I am continuing to read up on the different oom userspace options as well as running earlyoom locally, but for now I do not think this is a good default for all systems.

-1

I've been using earlyoom on my low-ram(<=4GB) systems for a while and it, along with Firefox(>68) instead of Chrome, is making a real-world difference. The kernel oom is really bad at killing process, esp. browsers, and makes the system seem total hung. Can we just make it a default 'enabled' in Anaconda when RAM is '<=4GB), at least?

+1

Hi FESCo, I think the WG will need even more time for further discussion. We're still not yet all on the same page yet after this week's meeting, and since your meetings are a day before ours, that's where we're going to be at the time of next week's FESCo meeting.

I think we have consensus that we intend to prioritize system responsiveness by default. Responsiveness is a good default for virtually all systems. But we're still considering whether earlyoom is what we want, or whether we can achieve the goal using other solutions. We're also considering what our acceptable timeframe is.

Well we have a discussion with Dan
Schatzberg from Facebook scheduled for February 4, so that's after the next two FESCo meetings. That's a lot of time for further discussion. So far the WG has only decided that it wants to make some decision itself, so I suggest deferring this.

At this point, I'm starting to think the F32 timeframe is rather aggressive, but it could still work out if we defer change approval only three or four more weeks or so.

I agree, it increasingly looks like it would better to withdraw this proposal for now, or postpone it to F33. I'd rather not rush changes that will be enabled on every new fedora Workstation install ...

I'm afraid I was myself very confused. I missed our Jan 14 meeting, where we actually approved enabling earlyoom. We want to use it temporarily until a better future is available. That better future needs to do at least as well as earlyoom at maintaining desktop responsiveness. We recognize earlyoom as a necessary hack until the rest of the OOM ecosystem improves.

The WG is currently hearing from invited experts to provide more information on OOM handling. This week's invited expert was opposed to enabling earlyoom even temporarily, and I misunderstood the outcome of that meeting to indicate the WG itself was no longer keen on earlyoom.

Alright, votes in ticket are (+0, ±2, -1), so we'll discuss this during the meeting next week.

Sorry about the confusion with this issue. Originally this was an ordinary change/feature proposal.

Since then, the WG widely agreed it's something that shouldn't be punted to FESCo (6a) and a majority agreed to proceed with its inclusion (6c), per the minutes. Of course, FESCo can override the WG decision. Resource control is a distribution wide concern.

https://meetbot.fedoraproject.org/fedora-meeting-2/2020-01-15/workstation.2020-01-15-10.49.html

The WG isn't desperate to do something for Fedora 32, but it recognizes that users experiencing these problems are acting desperately when they quickly reach for the power button as their go to work around. Does earlyoom always solve that? Nope. It's very naive. And in fact if it triggers, it's perhaps only a somewhat graceful failure, because the user's job fails. What they really want is for the job to succeed while retaining a responsive GUI. But if the job is otherwise going to fail by power button, why not send it a SIGTERM instead? In that case system responsiveness returns, and the user's other work is preserved rather than clobbered by the power button.

Anyway, I plan to attent the 3 Feb FESCo meeting so I can answer questions. Or alternatively, push this issue to 10 Feb, if preferred. The WG has another domain expert as guest for our 4 Feb meeting, and I'll likely have more to report afterward.

Like @ckrzen I have a system that regularly experiences catastrophic OOM issues, in my case taking the form of paging cascades which hang the desktop near-indefinitely until something is finally killed. (I have 8GB of RAM, but Chrome is regularly using like THREE of them all by itself, so...)

I have noticed that in recent releases (since Fedora 29/30-ish, I'd say?) the responsiveness of the console seems to have increased greatly, such that I can now usually hit Ctrl+Alt+F3 to switch over to a console getty, log in that way, and deal with the problem — in the past, even those actions would take multiple minutes to complete.

(I've also noticed that the kernel OOM manager does seem to get "kicked awake" simply as a consequence of switching to the console, such that by the time I've logged in it's often already taken action. Which is great, but it'd still be nice if I didn't have to initiate the process myself. But it doesn't seem inclined to do anything about the problem, as long as my desktop session is still occupying the display.)

Point is, I'm very much in the target audience for this feature. But, I still find myself agreeing with @bookwar:

I would prefer to have a self-contained change introducing earlyoom to Fedora before having a system-wide change enabling it by default.

Get EarlyOOM in front of those of us who can make use of it, as an optional feature. We'll enable it (some of us), and provide feedback that can help make the decision about enabling it by default.

That indeed sounds like a reasonable thing to do, at least in F32.

There's already an earlyoom package, since Fedora 27. You can install it today and provide feedback.

@catanzaro Huh, so there is! Installed.

(...And enabled/started.)

I notice it wasn't enabled by default on install — it seems like it could be, depending how strictly you interpret the first point of the Guidelines section on enabling services by default:

Must not alter other services

Installation of the package providing the unit auto-started by this preset MUST NOT change the behavior of any other service running (or potentially running) on the system.

Normally earlyoom wouldn't affect other services, unless you count possibly killing them if the system is critically memory-starved. 😏

Anyway, now we wait...

I posted a call for testing user space OOM managers in August 2019 on devel@. And on 1 Sep wrote, "Should Fedora enable earlyoom in Fedora 32 Workstation? Maybe."

the WG widely agreed it's something that shouldn't be punted to FESCo (6a) and a majority agreed to proceed with its inclusion (6c), per the minutes

That means that the Change is approved, for Workstation. I'm fine with that, and it doesn't seem FESCo has any reason or desire to override the decision.

  • 2320 F32 System-Wide Change: Enable EarlyOOM (decathorpe, 15:09:19)
  • FESCo acknowledges the Workstation WG's decision and does not want
    to overrule (decathorpe, 15:13:33)

Sorry for the delay.

Metadata Update from @decathorpe:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

2 months ago

Login to comment on this ticket.

Metadata