https://koji.fedoraproject.org/koji/taskinfo?taskID=36286223 looks to be stuck. It's been finished for a while. Can someone check this and either release or just confirm I need to be patient?
Oddly, there's a bunch of defunct 'xz' processes. Not sure what happened. ;(
I'm going to try and restart kojid there and/or free it for another builder.
Metadata Update from @kevin: - Issue assigned to kevin - Issue priority set to: Waiting on Assignee (was: Needs Review)
So, I restarted kojid and it ran again and got stuck the same way. Then I freed it and it got stuck the same way on another builder. ;(
|-kojid,78021 /usr/sbin/kojid --fg --force-lock --verbose | `-kojid,14080 /usr/sbin/kojid --fg --force-lock --verbose | `-mock,14382 -tt /usr/libexec/mock/mock -r koji/f31-build-16879501-1217837 --old-chro ot --no-clean --target aarch64 ... | `-rpmbuild,14633 -bb --target aarch64 --nodeps /builddir/build/SPECS/kernel.spec | |-{rpmbuild},23890 | |-{rpmbuild},23891 | |-{rpmbuild},23892 ... | |-(xz,38123) | |-(xz,38128) | |-(xz,38130) | |-(xz,38131) | |-(xz,38132) | |-xz,38133 -cd | |-(xz,38134) | |-(xz,38135) | |-(xz,38136) | |-(xz,38137) | |-(xz,38138) | |-(xz,38139) | |-(xz,38140) | |-(xz,38141) | |-(xz,38142) | |-(xz,38143) | |-(xz,38144) | |-(xz,38145) | |-(xz,38146) | |-xz,38147 -cd | |-(xz,38230) | |-(xz,38234) | |-xz,38235 -cd | `-xz,38236 -cd
kojibui+ 38123 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38128 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38130 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38131 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38132 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38133 0.0 0.0 12572 2720 ? S 02:29 0:00 xz -cd kojibui+ 38134 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38135 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38136 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38137 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38138 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38139 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38140 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38141 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38142 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38143 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38144 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38145 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38146 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38147 0.0 0.0 12572 2724 ? S 02:29 0:00 xz -cd kojibui+ 38230 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38234 0.0 0.0 0 0 ? Z 02:29 0:00 [xz] <defunct> kojibui+ 38235 0.0 0.0 12572 2872 ? S 02:29 0:00 xz -cd kojibui+ 38236 0.0 0.0 12572 2796 ? S 02:29 0:00 xz -cd
I can't see anything that changed in rpm or xz recently.
Perhaps @pwhalen or @pbrobinson would have some idea...
Hmmmm, that might be the find command to manually compress all modules:
find $RPM_BUILD_ROOT/lib/modules/ -type f -name '*.ko' | xargs -P%{zcpu} xz; \
but some of those also looks like decompressing (xz -cd)
Can you get a dump of all stack traces via sysrq? I'm curious what those 'xz -cd' processes are doing.
I diffed the working and non-working buildroots. The only things that stood out to me were a new coreutils with a patch to disable flashing for symbolic links and a new pkgconf version and neither of them seem immediately suspsicious
sysrq l gives:
Jul 17 15:14:56 buildvm-aarch64-02 kernel: sysrq: SysRq : Show backtrace of all active CPUs Jul 17 15:14:56 buildvm-aarch64-02 kernel: sysrq: CPU89: Jul 17 15:14:56 buildvm-aarch64-02 kernel: Call trace: Jul 17 15:14:56 buildvm-aarch64-02 kernel: dump_backtrace+0x0/0x160 Jul 17 15:14:56 buildvm-aarch64-02 kernel: show_stack+0x24/0x30 Jul 17 15:14:56 buildvm-aarch64-02 kernel: showacpu+0x8c/0xb8 Jul 17 15:14:56 buildvm-aarch64-02 kernel: flush_smp_call_function_queue+0x98/0x150 Jul 17 15:14:56 buildvm-aarch64-02 kernel: generic_smp_call_function_single_interrupt+0x18/0x20 Jul 17 15:14:56 buildvm-aarch64-02 kernel: handle_IPI+0x1c4/0x370 Jul 17 15:14:56 buildvm-aarch64-02 kernel: gic_handle_irq+0x13c/0x15c Jul 17 15:14:56 buildvm-aarch64-02 kernel: el0_irq_naked+0x4c/0x54
sysrq t is...gigantic. https://infrastructure.fedoraproject.org/infra/tmp/dmesg-sysrq-l-20190717.xz
nothing in sysrq-w it seems.
After losing a day to an unrelated kernel-headers issue, https://koji.fedoraproject.org/koji/taskinfo?taskID=36348544 looks like it's in a similar state. Can you double check this is the same bad state? This is a bad week and I haven't had time to think more about what to debug or check :(
Similar, but not the same.
This time, no xz processes, just rpmbuild taking up 200% cpu.
I straced it and got:
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [144/1807] [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 6370) = 6370 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/svcaut h_gss.h", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 797) = 797 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/svcsoc k.h", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 2140) = 2140 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/timer. h", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 1172) = 1172 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xdr.h" , O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 16360) = 16360 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprt.h ", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 15562) = 15562 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprtmu ltipath.h", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 2051) = 2051 [pid 47840] close(3) = 0 [pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/ usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprtrd ma.h", O_RDONLY) = 3 [pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 3021) = 3021 ...
some kind of loop in debuginfo generation?
https://koji.fedoraproject.org/koji/taskinfo?taskID=36348544 so this did eventually finish. I suspect there's an underlying issue with debuginfo generation. I'm going to close this ticket for now, we can reopen if we start hitting it again.
Metadata Update from @labbott: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.