Since this monday (I was on PTO last week) I can't finish any Firefox build on koji. I tried to chancel the builds and launch again but it freezes again. There are examples of the last builds:
https://koji.fedoraproject.org/koji/taskinfo?taskID=36440626 https://koji.fedoraproject.org/koji/taskinfo?taskID=36445084 https://koji.fedoraproject.org/koji/taskinfo?taskID=36445086
It's F29/30/31. There's no error message, the builds are just hanging here. Random arches are finished and random arches are frozen.
We have made changes to the ppc64le builders this week, but nothing else has really changed on the other builders.
The f29 build: The x86_64 and i686 builds both seem to have a bunch of defunct gmake processes:
|-kojid,1377 /usr/sbin/kojid --fg --force-lock --verbose | `-kojid,24816 /usr/sbin/kojid --fg --force-lock --verbose | `-mock,25134 -tt /usr/libexec/mock/mock -r koji/f29-build-16954392-1221801 --old-chro ot --no-clean --target i686 ... | `-rpmbuild,25627 -bb --target i686 --nodeps /builddir/build/SPECS/firefox.spec | `-sh,25670 -e /var/tmp/rpm-tmp.nDZ5N1 | `-python2.7,26934 ./mach build | |-python2.7,26973 ./mach build | |-gmake,30804 -f client.mk -s | | `-gmake,30807 -j2 -C /builddir/build/BUILD/firefox-68.0.1/objdir | | `-gmake,32199 compile | | `-gmake,32202 recurse_compile | | |-(gmake,32203) | | `-(gmake,32204) | |-{python2.7},30805 | `-{python2.7},30806
kojibui+ 32203 0.0 0.0 0 0 ? Z Jul23 0:00 [gmake] <defunct> kojibui+ 32204 0.0 0.0 0 0 ? Z Jul23 0:00 [gmake] <defunct>
The f31 aarch64 one has also gcc/g++ defunct:
kojibui+ 20270 0.0 0.0 0 0 ? Z Jul23 0:00 [gmake] <defunct> kojibui+ 20273 0.0 0.0 0 0 ? Z Jul23 0:00 [gmake] <defunct> kojibui+ 20283 0.0 0.0 0 0 ? Z Jul23 0:00 [gcc] <defunct> kojibui+ 20296 0.0 0.0 0 0 ? Z Jul23 0:00 [g++] <defunct>
I assume local builds work ok? Does a scratch build of a previously working version work? Could there be any changes in the package or the buildroot to account for this?
Metadata Update from @kevin: - Issue assigned to kevin - Issue priority set to: Waiting on Assignee (was: Needs Review)
Local builds works fine, I can try to reduce number of parallel builds.
I don't see any OOM hits in dmesg or anything either. :(
https://koji.fedoraproject.org/koji/taskinfo?taskID=36547978
Scratch build of 68.0 package which was built fine 3 weeks ago. Now it exposes the same issue as reported here.
btw. If that matters the same packages (68.0.1) are built fine in brew (rh build system).
btw. It looks like https://github.com/nodejs/node/issues/14752 (nodejs fiddles with stdout which blocks the build). We had this problem with flatpak builder and rhel brew builder but not in koji.
We use a node-stdout-nonblocking-wrapper to fix that. I added it to 68.0.1 builds but it doesn't seem to have any effect.
~ half of the builds are finished now - after 2 days. The scratch build (https://koji.fedoraproject.org/koji/taskinfo?taskID=36547978) is also almost done, only missing arch is x86_64.
Looks to me that the koji builders are just very very slow.
@stransky F29 has only a finished ppc64 build: https://koji.fedoraproject.org/koji/taskinfo?taskID=36496393 All others are still not finished. F30 looks completely different as you mentioned.
So, it's the same thing.
rawhide / i686:
|-kojid,7256 /usr/sbin/kojid --fg --force-lock --verbose | `-kojid,23432 /usr/sbin/kojid --fg --force-lock --verbose | `-mock,23734 -tt /usr/libexec/mock/mock -r koji/f31-build-17089753-1223218 --old-chro ot --no-clean --target x86_64 ... | `-rpmbuild,24179 -bb --target x86_64 --nodeps /builddir/build/SPECS/firefox.spec | `-sh,24224 -e /var/tmp/rpm-tmp.ETfxit | `-xvfb-run,25491 /usr/bin/xvfb-run ./mach build | |-Xvfb,25500 :99 -screen 0 640x480x24 -nolisten tcp | `-python2.7,25504 ./mach build | |-python2.7,25539 ./mach build | |-gmake,29481 -f client.mk -s | | `-gmake,29484 -j2 -C /builddir/build/BUILD/firefox-68.0.1/obj dir | | `-gmake,6694 default MOZ_PROFILE_USE=1 | | `-gmake,8145 compile | | `-gmake,8148 recurse_compile | | |-(gmake,8149) | | `-(gmake,8150) | |-{python2.7},29482 | `-{python2.7},29483
kojibui+ 8149 0.0 0.0 0 0 ? Z 19:09 0:00 [gmake] <defunct> kojibui+ 8150 0.0 0.0 0 0 ? Z 19:09 0:00 [gmake] <defunct>
rawhide / x86_64:
kojibui+ 17024 0.0 0.0 0 0 ? Z 16:48 0:00 [gmake] <defunct> kojibui+ 17365 0.0 0.0 0 0 ? Z 16:55 0:00 [g++] <defunct>
It seems to be any arch. The mass rebuild finished without too much trouble, so I am not seeing other things hand this way.
I'm not really sure where to look next.
Hi @xhorak and @stransky,
the builds firefox-68.0.1-3.fc30 and firefox-68.0.1-3.fc31 are finished. Is it also possible to have a finished firefox-68.0.1-3.fc29 build? The build for F29 was cancelled.
I see new builds working in koji now. Was this sorted out?
This is not a koji issue, it seems to be a rust bug: https://github.com/rust-lang/cargo/issues/7200
As a workaround we need to build with -j1 (no multiprocess).
ok. Hope it's fixed/reverted soon.
Feel free to re-open this or file a new issue if there is anything more we can do from our side.
:sparkler:
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.