#307 swaponzram for the cloud base image
Opened 5 months ago by dustymabe. Modified 4 months ago

@chrismurphy has reached out to us to evaluate swaponzram by default for the cloud base image. More information:

We discussed this during the cloud meeting today. It seems the jury is still out and some of us would like to request more information before making a decision. We'll discuss again in the next meeting.

The configuration for the zram-generator has a setting:

# The maximum amount of memory (in MiB). If the machine has more RAM
# than this, zram device will not be created.
# "host-memory-limit = none" may be used to disable this limit. This
# is also the default.
host-memory-limit = 9048

So hosts with more than $host-memory-limit RAM will see no change if we were to implement this.

We are free to change the default $host-memory-limit from the defaults provided by the package if we want to as well.

because of $host-memory-limit we can separate the problem space into two buckets:

  • large instances (> $host-memory-limit RAM)
    • will observe no change in behavior if we enable swaponzram
  • small instances (< $host-memory-limit RAM)
    • For this it would be nice to have more data about the benefits. i.e. when does it become beneficial? Is there a value at which it has negative impact?

The "in defence of swap" article is worth the read, written by a mm/cgroup2/systemd developer and also had it reviewed by other cgroup2 folks.

A system provisioned to not swap at all is arguably memory over provisioned. You've given it more memory than it needs, just to avoid swapping. That's fine. Nothing is wrong with that.

What is the metric to know if you'd benefit from swap-on-zram (which is not zswap, btw)? You should look at page faults. That is in effect "swapping", but it's happening with file pages being faulted (dropped) and then re-read, and this is happening due to the lack of a swap device which is used for anonymous page evictions. By evicting inactive anonymous pages, you make it possible to avoid reclaim. Reclaim is just as expensive as "swap thrashing" when it's a drop<-->reload cycle. What you want instead is anonymous page out -> disk, one way eviction, where it doesn't reload at all. It's mostly a bunch of stale pages or seldom used so they can just go away.

swap-on-zram is better than 0% swap. But it's not as effective as having a non-memory swap device because, "no free lunch" you are in fact consuming memory, even if it's a fraction of the page size due to compression, rather than eviction to disk which is 100% efficacy.

How to test? Try to get your memory provisioning perfectly equal to the workload requirement, no more, and then run it. Watch major page faults go up with cat /sys/fs/cgroup/memory.stat

There's a bunch of refault stuff in there. Now merely enable swap-on-zram and rerun the workload. No more refaults (or significantly less anyway). That means a more efficient system.

Login to comment on this ticket.