Learn more about these different git repos.
Other Git URLs
I wasn't having luck in building pypy3.8 or 3.9~rc1 packages (EPEL 8). Usually (e.g., on a local box) it took less than 2 hours to finish. On copr, after 6 hours, the build failed at the final linking period, complaining "/usr/bin/ld.gold: fatal error: /tmp/ccFkmvEddebugobj: Cannot allocate memory". See for example build 3349259.
/usr/bin/ld.gold: fatal error: /tmp/ccFkmvEddebugobj: Cannot allocate memory
My guess is that the available memory is probably less than 4.5G (4.5G is suggested in the .spec file). The build log happens to run a free command, and it does seem to show low memory but abundant swap space:
free
+ free total used free shared buff/cache available Mem: 4007912 825508 156696 2896944 3025708 100976 Swap: 151291164 620032 150671132
I wonder if there's a way to request a little bit more memory for a build, or, if I am doing something improperly (that leads to the large memory use). Thank you for any suggestions!
Commit 4fe6812f relates to this ticket
See also: https://bugzilla.redhat.com/show_bug.cgi?id=2051154
I wonder if there's a way to request a little bit more memory for a build
I'm afraid the answer is negative; we have a single performance category for all the builders and it is not possible to increase the RAM across all.
Why swap doesn't help here? Can this be OOM releated?
@praiskup Thank you for turning off the systemd-oomd. After your message, assuming that (turn off) was already in-place, I resubmitted a build, but it failed again at the same linking point. "Cannot allocate memory" build 3382615
I don't really know why swap doesn't help here. I actually tried a) turning off LTO; b) instead of gold, using ld. None helped. But it could indeed relate to some changes in copr, because similar builds (of other people) were quite successful. They were also much shorter in time, i.e., 1hr vs 6hr (mine).
@praiskup Thank you for turning off the systemd-oomd. After your message, assuming that (turn off) was already in-place,
I realized I did not push this to production; in production it was like ~10 minutes after that message for sure.
But it could indeed relate to some changes in copr,
You mean change in your copr project, not the build system change?
They were also much shorter in time, i.e., 1hr vs 6hr
This is weird, indeed.
So, in the very same bug we were suggested to test vm.overcommit_memory = 2. Can you please give the build another try?
vm.overcommit_memory = 2
Metadata Update from @praiskup: - Issue assigned to praiskup
Thank you, and yes, I just started a new build.
But it could indeed relate to some changes in copr, You mean change in your copr project, not the build system change?
I actually meant the build system, after reading the other thread you referred to. I didn't change my copr project and it have never successfully built the packages on copr, always got failures due to memory issues except for once the build was allocated to a vm with larger memory (build 03330365, search text "+ free" you'll see the vm had 15G memory, larger than the usual 4G).
Hope that the vm.overcommit_memory change would fix the issue.
vm.overcommit_memory
Thank you!
Unfortunately the new build failed with the same error.
Ok, thanks, any idea is welcome :-/
As proposed in the mentioned bug, we are completely OK to provide access to one builder virtual machine (root access) and let anyone experiment with this (tweak kernel, etc.).
Here are a few more observations:
gcc-toolset-11
To summarize:
Although I have no idea why the build takes much longer time now, I think the initial concern of this issue is kinda solved (by switching to gcc 11) and also tracked in bug #2051154 (swap is not used). So in order not to overflow your time, I think this issue can be closed.
Metadata Update from @praiskup: - Issue tagged with: bug
Somehow the pypy3.9 builds took much longer time with the current vm. Not just my builds. churchyard's builds used to take 1 hour (on fedora) and it's now 4 hours.
There are two kinds of builders (on-premise VMs have 4G RAM + lots of SWAP, and AWS VMs have 16G RAM + lots of SWAP). This is likely the reason (it is much more likely to get the slower machine, we use AWS only when the set of other builders isn't enough to process the queue quickly).
Ok, I think we can keep tracking this in rhbz#2051154. Thank you for all the input!
Metadata Update from @praiskup: - Issue close_status updated to: External - Issue status updated to: Closed (was: Open)
Metadata Update from @praiskup: - Issue status updated to: Open (was: Closed)
Seems like the reason is we have both zram0 swap and normal volume swap, but zram0 has higher priroity. This means that zram0 is used by default ... and then kernel probably fails to move part of some swapped area to another volume.
Commit 50e91eed relates to this ticket
Thank you. I started a build (pypy3.9 with gcc 8) to see if the link step could go through. Will report back.
Note for our team: This is just a worked-around. We need to: - disable zram - and setup the large volume priority to >= 100 ... both while we generate the images.
The build succeeded in about one hour, which verifies that 1. memory issue has been resolved; 2. as a by-effect, the build time also goes back to normal (reduced from > 7hours to ~1hour).
Glad to hear that, thank you for the confirmation. We still need a real fix - generate new VM images for our builders (therefore the issue is kept open).
The images are already generated. We already removed on leftover in #2136. So I hope everything works well now, and in the future :-)
Metadata Update from @praiskup: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.