#127 enable swap-on-ZRAM by default
Closed: Fixed 2 years ago by chrismurphy. Opened 2 years ago by chrismurphy.

Summary: During startup, create a ZRAM device and activate swap on it. This does co-exist with an existing swap partition, so it can apply to both new installations, and upgrades. Planning for Fedora 33

Questions:
- Enable for upgrades? Or only new installations? If enabled on upgrade, system likely will have two swaps.
- What should the size be? Fixed? Or percentage? How to determine this objectively?
- What implementation to use? There are three. Two packages are installed by default on Fedora Workstation: anaconda-core (provides zram.service), and zram (provides zram-swap.service). Yes, a bit confusing. Part of the dilemma is they all work, so a vastly superior solution doesn't exist.

Scope:
This can't strictly be a Fedora Workstation change. It needs to exist everywhere Anaconda exists: DVD, netinstall, Lives. That's because the idea is to provide a solution allowing Anaconda to deprecate their own implementation, leaving Workstation (and other desktops) with one implementation installed.

The systemd zram-generator is perhaps slightly more straightforward to include everywhere by default. It's activated by the existence of a configuration file. One option is to not include a config by default Fedora wide, allowing each product to supply their own. Or alternatively, provide a sane default config every where, and each product can choose to exclude/delete it.

Related devel@ thread.

Related issues: #54, #98, #119, #120, #121

Related stake holders and interested parties: @hadess @jkonecny @zbyszek @dustymabe @pbrobinson


Firstly, see https://www.kernel.org/doc/Documentation/blockdev/zram.txt.

Fixed? Or percentage? How to determine this objectively?

disksize = 100% MemTotal is OK. Why not?

A large swap size on zram does not take up disk space, however, an increase in disksize reduces the available memory.

Note:
There is little point creating a zram of greater than twice the size of memory
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.

0.1% of the size of the disk

I found that is 0.42%. If I set 10 GiB disksize, MemAvailable decreases in about 45 MiB.

we expect a 2:1 compression ratio

By the way, this may be 1.1:1, and may be 100:1. This is highly dependent on the type of data being compressed. 2:1 is just the expected average. When filling memory with browser tabs, I usually observe a compression ratio of 3:1.

Afaik, ChomeOS uses disksize = 150% MemTotal (it is very likely that I am wrong).

Large swap space on hard disk is not good, because it uses disk space.

Why swap space on zram with disksize = 100-150% should be bad?

I've been using a 1:1 ratio, up until recently. I don't think it's bad, when everything plays nicely.

Lately I've capped the ZRAM device to 3G. The smaller size improves the time to OOM kill in the case where the workload is simply asking for too many resources. Where a larger size increases the time to OOM and increases the chance of a totally wedged in system compared to swap on drive. So the small size is a conservative approach, while collecting more data.

100% of a page swapped to a drive is freed from memory, whereas with ZRAM it actually increases memory demand. Yes it's a fraction of its original size, but it's not a complete page out. As the system becomes more swap dependent, swap on ZRAM actually takes RAM away from the system (at a rate that depends on the workload and the compression ratio). There is no free lunch here. If it's 50% free, you're still paying the other 50% :D

Unlike real partitions, it's easy for us to change this as we learn more. We could include a comment in the configuration file that while the user is invited to experiment with different sized ZRAM devices, that the configuration file is "vendor owned" and thus can be replaced at any time in favor of new defaults.

  • Enable for upgrades? Or only new installations? If enabled on upgrade, system likely will have two swaps.

I would still enable it for upgrades, it doesn't cause any data losses, and it's straight-forward to disable.

  • What should the size be? Fixed? Or percentage? How to determine this objectively?

I went with the default 50% of RAM size (the default), but without a disk swap, and I've not had wedges.

  • What implementation to use? There are three. Two packages are installed by default on Fedora Workstation: anaconda-core (provides zram.service), and zram (provides zram-swap.service). Yes, a bit confusing. Part of the dilemma is they all work, so a vastly superior solution doesn't exist.

I'd prefer the systemd zram-generator to be used (it won't get merged into systemd because it's written in Rust, but you can consider it part of systemd's ecosystem), adding a requires for a subpackage containing the Workstation configuration file should be straight forward enough.

I'm pondering some conflict avoidance. Anaconda folks need to be able to depend on zramwhatever existing, before they can remove/suppress their implementation. Also, zram package (provides zram-swap.service) is installed in Workstation by default, but disabled, in F31 and Rawhide.

Draft proposal (i.e. please poke holes in this, refine it, not a voting proposal):

1. Remove zram from the Rawhide package set ASAP, so clean F32 installs don't have it.

2. Add zram-generator to the Rawhide package set now so clean F32 installs (and upgrades?) do have it, but it's also disabled by default (no config file). We can also encourage early testing during F32 by just telling the user to copy a configuration file into /etc

3. In ~6 months, F31/32 -> F33 upgrade: in case zram (zram-swap.service) is present and enabled, run a script that does 'systemctl preset zram-swap.service' to make sure it's disabled. We don't want two ZRAM devices. This will reset to vendor default, disabled everywhere except IoT (unless they agree to go with zram-generator by this time).

Given F33 upgrades must contend with both F31 and F32, there's perhaps minimal advantage in fixing F32. And I'll just make a single bigger change once F33 branch happens.

fedora-comps/comps-f32.xml.in contains four references to zram package in the following groups:

anaconda-tools: remove, this is superfluous/conflicting with both their own implementation as well as zram-generator
workstation-product: s/zram/zram-generator
system-tools: s/zram/zram-generator
arm-tools: no change, unclear whether ARM folks are willing to move to zram-generator

Just my 2 cents: i am using zram for a long time and only now realized that in my case and my configuration better without it. After > 2 hours uptime desktop become sluggish and less responsive. This on AMD 4-core CPU (comparable to old i5-2400) and 8 GB RAM.

There also some complains on Windows and users prefer to turn off compression even on decent mid-end/high-end systems https://www.reddit.com/r/thedivision/comments/csxv15/att_devs_is_memory_compression_having_issues_with/ And i can confirm that disabling mem compression both on Win and Fedora helps me to prevent micro FPS drops and micro stuttering in some games.

Definitely zram not always the best choice and not for all cases.

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

2 years ago

Summary+status:

  • Fedora IoT has been using it for ~2 years by default. No major issues.
  • Anaconda has been using it by default on DVD/netinstall for ~2 years and on Lives ~1 year, for RAM < 2048M. That might seem rare, but it's what openQA VMs use for test installations.
  • systemd/zram-generator is mostly working; I've filed issues upstream and will keep an eye on it.

WG agreed to a test day. It needs two leads to coordinate with Fedora QA, GNOME, Anaconda, and others as appropriate, to have a sufficiently broad sample pool, not just idealized systems.

Assign @chrismurphy and @catanzaro for this.

Part of this test day will be writing up how to install systemd/zram-generator:

  • it is a module in f30,f31, and a regular package in f32
  • it needs to be rebased on upstream once a few things are fixed/added
  • broadest test possible, invite f31,f32,f33 testers

Test cases:

  • S3 suspend-to-RAM conflicts
  • Simulate battery death while S3 suspend
  • Interaction with and without earlyoom
  • Dozens of Firefox tab to fill both memory and swap
  • webkitgtk torture test
  • swap-on-ZRAM plus swap partition (the likely configuration following an upgrade); i.e. two swap devices

Any additions?

Metadata Update from @chrismurphy:
- Issue assigned to chrismurphy

2 years ago

Interaction with and without earlyoom

earlyoom with swap on zram will not work when filling memory by incompressible data (or with huge disksize): system hangs with high SwapFree.

earlyoom with swap on zram will not work when filling memory by incompressible data (or with huge disksize): system hangs with high SwapFree.

What if the ZRAM device size is 50% RAM size (or less)?

system hangs with earlyoom and swap on zram, disksize=150% MemTotal, compression ratio at the end is 1.4, SwapFree=25%, the zram device occupies 80% of the memory with running Blender https://imgur.com/a/wNvBVIn.

With disksize=100% MemTotal this should be very rare case. With 50% this may be impossible in practice (if you don't put clear urandom in the memory), IMHO.

Metadata Update from @chrismurphy:
- Issue untagged with: meeting

2 years ago

Any objection to me submitting an F33 change proposal for this? I'd make clear the WG wants to see test results before backing the change, but I think a change proposal gives it more visibility early on to poke holes in, and they often do have Test Days attached anyway.

Full zram vs zswap comparsion is not provided yet.

Any objection to me submitting an F33 change proposal for this?

No.

Questions:
- What should the size be? Fixed? Or percentage? How to determine this objectively?

Also: mem_limit, num_devices, comp_algorithm default values.

Full zram vs zswap comparsion is not provided yet.

zram is a simpler change, systemd/zram-generator. It's also more generic across all of Fedora editions and spins.

zswap, we don't currently have a way to set the kernel command line options; I guess some service could insert this on first boot, to handle both new installs and upgrades?

zswap, we don't currently have a way to set the kernel command line options

We do not need it, see https://github.com/torvalds/linux/blob/master/Documentation/vm/zswap.rst.

Just run:

1
2
3
4
5
6
#!/bin/sh
# enable zswap
echo 1 > /sys/module/zswap/parameters/enabled
echo z3fold > /sys/module/zswap/parameters/zpool
echo 50 > /sys/module/zswap/parameters/max_pool_percent
echo lzo-rle > /sys/module/zswap/parameters/compressor
1
2
3
#!/bin/sh
# disable zswap
echo 0 > /sys/module/zswap/parameters/enabled

Check result:

# grep -R . /sys/module/zswap

using zbud for now since z3fold has some issues still being worked out

Have problems been fixed in z3fold?

BTW, zswap is experimental yet.

For this reason, zswap is a work in progress and should be considered experimental.

OK, maybe zram is OK because zswap is experimental.

Good point about just using the sysfs interface to set up zswap, I always forget this for some reason. It could be setup with either a systemd generator or a service unit. Hypothetically zswap's LRU eviction is better, being designed specifically for the swap use case; whereas ZRAM devices are intended to be generic compressed RAM disks, you can format them like any other block device. But in practice, I've discovered no difference.

I asked upstream about zswap's experimental status last summer, and they said zbud was imminently production ready, where z3fold they thought probably needed a bit more time in the oven.

I'll ask upstream for some tests to compare/evaluate zswap vs zram for our use case; and also its experimental status.

Since zswap needs to be backed by a conventional swap partition, it's possible user data leaks onto persistent media; whereas swap-on-ZRAM avoids this on new installs by not creating a conventional swap partition. I'd say for now, focus on the zram approach, and keep zswap in mind for a fallback or future feature once encrypted+signed swap appears.

Any objection to me submitting an F33 change proposal for this? I'd make clear the WG wants to see test results before backing the change, but I think a change proposal gives it more visibility early on to poke holes in, and they often do have Test Days attached anyway.

I guess now would be a good time to get zram working if we're planning to have it in F33. Any updates here?

I'm working on the change proposal today all day until it's done, at least in draft form for preview.

In some sense there's more than one proposal. I tried drafting two proposals (just outlining the logic) and it was even worse and more confusing than just one proposal. There are no technical impediments, the difficulty is all a technical-writing, prose, flow problem :D This is familiar territory for me.

"Fedora Workstation Edition anticipates full opt-in by default" is really confusing, better to just say "we will enable this feature"

Many clean-ups have happened, including that. I've also starting redoing footnoting/references, they're super annoying right now. I prefer markdown, but the wikitext format has some features that I can use to make this look far less cluttered. Right now it's in an in-between state.

@chrismurphy This change should be system-wide for all Fedora variants.

@hakavlad What about deconfliction? It's up to the user to deconflict or they get both ZRAM and zswap based swapping? I'd ask @zbyszek if there's a chance of doing this in zram-generator (zram-zswap-generator) so that there's deconflict logic possible. Maybe a switch in the configuration file to indicate preference. Prefer zswap if swap-on-disk exists/is active; falling back on swap-on-ZRAM?

I do like the idea of making it easier for user to experimen with these things; and answer the question what workloads favor one or the other.

I think right now there is a slight advantage for swap-on-ZRAM because it's volatile. So the concerns about user data leaks into swap are obviated. Whereas with zswap, they're still present.

Anyway, also saw this in kernel 5.7 notes:
zswap: allow setting default status, compressor and allocator at compile time commit.

I've incorporated feedback received, reorganized the flow, and simplified the summary. The status has been set to ready for wrangler. Hold on to your butts!
https://fedoraproject.org/wiki/Changes/SwapOnZRAM

Now to schedule the test day fedora-qa#632 And write up some test cases. @catanzaro What do you think about the time frame?

The sooner, the better.

@ ing you in an issue doesn't subscribe you. Could you subscribe to fedora-qa#632 and we'll coordinate with others there? I proposed June 10.

@hakavlad What about deconfliction? It's up to the user to deconflict or they get both ZRAM and zswap based swapping?

I think zram and zswap services creators should take care of this.

For example

# disable zswap (not recommended to combine zram and zswap)
echo 0 > /sys/module/zswap/parameters/enabled

https://github.com/hakavlad/yazram/blob/master/zram-on

I notice all of the variations of swap-on-zram using systemd service units, make a lot of assumption that they are the first or only unit doing this. And they fail because zram-generator beat them to it, since it runs earliest.

I'd rather see them all just go away, and either enhance zram-generator, or create a new generator, that can properly negotiate with fstab-generator, gpt-auto-generator, and zram-generator to do the right thing with respect to zram and zswap. I'm pretty sure this belongs in zram-generator, and the upstream developers are agreeable to this. But it's quickly getting outside the scope of a F33 time frame - in which case I'll just narrow the scope of the (swap-on-zram) feature to new clean default installations.

swap on zram test day is today, July 6
https://fedoraproject.org/wiki/Test_Day:F33_SwapOnZRAM

I expect to have a very brief report on the results at the July 7 meeting, and to recommend that we vote to approve this feature in ticket.

Test day results
https://testdays.fedorainfracloud.org/events/86

Hibernation is a bit weak, possibly my fault due to using debugging instructions instead of just doing systemctl hibernate - but there are three highly reliable sources for successful tests, and no known regressions.

Let's vote in the ticket.

Er, we never formally approved this?

+1 approve

✔ Use zram-generator instead of zram
https://pagure.io/fedora-comps/pull-request/513

✔ Replace 'zram' with 'zram-generator', and exclude Cloud edition
https://pagure.io/fedora-kickstarts/pull-request/658

Obsolete zram package from zram-generator-defaults
https://src.fedoraproject.org/rpms/rust-zram-generator/pull-request/3#

✔ Replace the zram service (draft)
https://github.com/rhinstaller/anaconda/pull/2727

✔ Don't create swap by default
https://github.com/rhinstaller/anaconda/pull/2723/

+1

That's six plus votes, so let's call this passed.

This is mostly done, there are a couple of lose ends:

  • Upgrades. Current plan is for zram-generator-defaults to obsolete zram. And zram is installed by default on Workstation since Fedora 31. That means anyone with a clean install as recent as Fedora 31 will get this feature. We could consider expanding this in Fedora 34 if DNF 5 makes it easier to add new things on upgrade.
  • GNOME Disks shows the /dev/zram0 device, and reports its Contents as "Unknown". I filed a bug.
    https://gitlab.gnome.org/GNOME/gnome-disk-utility/-/issues/177

The 'Obsolete zram package from zram-generator-defaults ' merge hasn't happened yet, but I've bumped that PR to let them know it's ready for merge. It really only affects upgrades. I think we can call this issue closed, and I'll track the obsolete from the QA side of things.

Metadata Update from @chrismurphy:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata