#2077 How to request (slightly) larger memory for a build?
Closed: Fixed 2 years ago by praiskup. Opened 2 years ago by derkuci.

I wasn't having luck in building pypy3.8 or 3.9~rc1 packages (EPEL 8). Usually (e.g., on a local box) it took less than 2 hours to finish. On copr, after 6 hours, the build failed at the final linking period, complaining "/usr/bin/ld.gold: fatal error: /tmp/ccFkmvEddebugobj: Cannot allocate memory". See for example build 3349259.

My guess is that the available memory is probably less than 4.5G (4.5G is suggested in the .spec file). The build log happens to run a free command, and it does seem to show low memory but abundant swap space:

+ free
              total        used        free      shared  buff/cache   available
Mem:        4007912      825508      156696     2896944     3025708      100976
Swap:     151291164      620032   150671132

I wonder if there's a way to request a little bit more memory for a build, or, if I am doing something improperly (that leads to the large memory use). Thank you for any suggestions!


Commit 4fe6812f relates to this ticket

I wonder if there's a way to request a little bit more memory for a build

I'm afraid the answer is negative; we have a single performance category for all the
builders and it is not possible to increase the RAM across all.

Why swap doesn't help here? Can this be OOM releated?

@praiskup Thank you for turning off the systemd-oomd. After your message, assuming that (turn off) was already in-place, I resubmitted a build, but it failed again at the same linking point. "Cannot allocate memory" build 3382615

I don't really know why swap doesn't help here. I actually tried a) turning off LTO; b) instead of gold, using ld. None helped. But it could indeed relate to some changes in copr, because similar builds (of other people) were quite successful. They were also much shorter in time, i.e., 1hr vs 6hr (mine).

@praiskup Thank you for turning off the systemd-oomd. After your message, assuming that (turn off) was already in-place,

I realized I did not push this to production; in production it was like ~10 minutes after that message for sure.

But it could indeed relate to some changes in copr,

You mean change in your copr project, not the build system change?

They were also much shorter in time, i.e., 1hr vs 6hr

This is weird, indeed.

So, in the very same bug we were suggested to test vm.overcommit_memory = 2.
Can you please give the build another try?

Metadata Update from @praiskup:
- Issue assigned to praiskup

2 years ago

So, in the very same bug we were suggested to test vm.overcommit_memory = 2.
Can you please give the build another try?

Thank you, and yes, I just started a new build.

But it could indeed relate to some changes in copr,

You mean change in your copr project, not the build system change?

I actually meant the build system, after reading the other thread you referred to. I didn't change my copr project and it have never successfully built the packages on copr, always got failures due to memory issues except for once the build was allocated to a vm with larger memory (build 03330365, search text "+ free" you'll see the vm had 15G memory, larger than the usual 4G).

Hope that the vm.overcommit_memory change would fix the issue.

Thank you!

Unfortunately the new build failed with the same error.

Ok, thanks, any idea is welcome :-/

As proposed in the mentioned bug, we are completely OK to provide access to one builder virtual machine (root access) and let anyone experiment with this (tweak kernel, etc.).

Ok, thanks, any idea is welcome :-/

Here are a few more observations:

  • churchyard submitted his latest pypy3.9 builds and they finished in 4 hours. These were on fedora 34/35/36/rawhide, with gcc >= 11.2.
  • I submitted a copy of that pypy3.9 build (slightly adapted to EPEL 8) but it again failed after 6 hours, ld.gold failed allocating memory. Redhat el8 comes with GCC 8.5.
  • I submitted another build with GCC replaced by gcc-toolset-11, and that passed the link stage. It was killed at about 7 hours because I didn't set a larger timeout.

To summarize:

  • Somehow the pypy3.9 builds took much longer time with the current vm. Not just my builds. churchyard's builds used to take 1 hour (on fedora) and it's now 4 hours.
  • GCC 8.5 on el8 requires much more memory during the link stage. Not sure why. GCC 11 on el8 instead can pass that stage.
  • Building pypy3.9 on el8 in general takes even longer time. probably 8 hours.

Although I have no idea why the build takes much longer time now, I think the initial concern of this issue is kinda solved (by switching to gcc 11) and also tracked in bug #2051154 (swap is not used). So in order not to overflow your time, I think this issue can be closed.

Metadata Update from @praiskup:
- Issue tagged with: bug

2 years ago

Somehow the pypy3.9 builds took much longer time with the current vm. Not just my builds. churchyard's builds used to take 1 hour (on fedora) and it's now 4 hours.

There are two kinds of builders (on-premise VMs have 4G RAM + lots of SWAP, and AWS VMs have 16G RAM + lots of SWAP). This is likely the reason (it is much more likely to get the
slower machine, we use AWS only when the set of other builders isn't enough to process the queue quickly).

Although I have no idea why the build takes much longer time now, I think the initial concern of this issue is kinda solved (by switching to gcc 11) and also tracked in bug #2051154 (swap is not used). So in order not to overflow your time, I think this issue can be closed.

Ok, I think we can keep tracking this in rhbz#2051154. Thank you for all the input!

Metadata Update from @praiskup:
- Issue close_status updated to: External
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @praiskup:
- Issue status updated to: Open (was: Closed)

2 years ago

Seems like the reason is we have both zram0 swap and normal volume swap, but zram0 has higher priroity. This means that zram0 is used by default ... and then kernel probably fails to move part of some swapped area to another volume.

Commit 50e91eed relates to this ticket

Thank you. I started a build (pypy3.9 with gcc 8) to see if the link step could go through. Will report back.

Note for our team: This is just a worked-around. We need to:
- disable zram
- and setup the large volume priority to >= 100
... both while we generate the images.

Thank you. I started a build (pypy3.9 with gcc 8) to see if the link step could go through. Will report back.

The build succeeded in about one hour, which verifies that 1. memory issue has been resolved; 2. as a by-effect, the build time also goes back to normal (reduced from > 7hours to ~1hour).

Thank you!

Glad to hear that, thank you for the confirmation. We still need a real fix - generate new VM images for our builders (therefore the issue is kept open).

The images are already generated. We already removed on leftover in #2136. So I hope everything works well now, and in the future :-)

Metadata Update from @praiskup:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata