#8007 Check on 'stucl' kernel build
Closed: Fixed 4 years ago by labbott. Opened 4 years ago by labbott.

https://koji.fedoraproject.org/koji/taskinfo?taskID=36286223 looks to be stuck. It's been finished for a while. Can someone check this and either release or just confirm I need to be patient?


Oddly, there's a bunch of defunct 'xz' processes. Not sure what happened. ;(

I'm going to try and restart kojid there and/or free it for another builder.

Metadata Update from @kevin:
- Issue assigned to kevin
- Issue priority set to: Waiting on Assignee (was: Needs Review)

4 years ago

So, I restarted kojid and it ran again and got stuck the same way. Then I freed it and it got stuck the same way on another builder. ;(

  |-kojid,78021 /usr/sbin/kojid --fg --force-lock --verbose
  |   `-kojid,14080 /usr/sbin/kojid --fg --force-lock --verbose
  |       `-mock,14382 -tt /usr/libexec/mock/mock -r koji/f31-build-16879501-1217837 --old-chro
ot --no-clean --target aarch64 ...
  |           `-rpmbuild,14633 -bb --target aarch64 --nodeps /builddir/build/SPECS/kernel.spec
  |               |-{rpmbuild},23890
  |               |-{rpmbuild},23891
  |               |-{rpmbuild},23892
...
  |               |-(xz,38123)
  |               |-(xz,38128)
  |               |-(xz,38130)
  |               |-(xz,38131)
  |               |-(xz,38132)
  |               |-xz,38133 -cd
  |               |-(xz,38134)
  |               |-(xz,38135)
  |               |-(xz,38136)
  |               |-(xz,38137)
  |               |-(xz,38138)
  |               |-(xz,38139)
  |               |-(xz,38140)
  |               |-(xz,38141)
  |               |-(xz,38142)
  |               |-(xz,38143)
  |               |-(xz,38144)
  |               |-(xz,38145)
  |               |-(xz,38146)
  |               |-xz,38147 -cd
  |               |-(xz,38230)
  |               |-(xz,38234)
  |               |-xz,38235 -cd
  |               `-xz,38236 -cd
kojibui+  38123  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38128  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38130  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38131  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38132  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38133  0.0  0.0  12572  2720 ?        S    02:29   0:00 xz -cd
kojibui+  38134  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38135  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38136  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38137  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38138  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38139  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38140  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38141  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38142  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38143  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38144  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38145  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38146  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38147  0.0  0.0  12572  2724 ?        S    02:29   0:00 xz -cd
kojibui+  38230  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38234  0.0  0.0      0     0 ?        Z    02:29   0:00 [xz] <defunct>
kojibui+  38235  0.0  0.0  12572  2872 ?        S    02:29   0:00 xz -cd
kojibui+  38236  0.0  0.0  12572  2796 ?        S    02:29   0:00 xz -cd

I can't see anything that changed in rpm or xz recently.

Perhaps @pwhalen or @pbrobinson would have some idea...

Hmmmm, that might be the find command to manually compress all modules:

find $RPM_BUILD_ROOT/lib/modules/ -type f -name '*.ko' | xargs -P%{zcpu} xz; \

  • find /builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git5.1.fc31.aarch64/lib/modules/ -type f -name '*.ko'
    BUILDSTDERR: ++ nproc --all
  • xargs -P123 xz

but some of those also looks like decompressing (xz -cd)

Can you get a dump of all stack traces via sysrq? I'm curious what those 'xz -cd' processes are doing.

I diffed the working and non-working buildroots. The only things that stood out to me were a new coreutils with a patch to disable flashing for symbolic links and a new pkgconf version and neither of them seem immediately suspsicious

sysrq l gives:

Jul 17 15:14:56 buildvm-aarch64-02 kernel: sysrq: SysRq : Show backtrace of all active CPUs
Jul 17 15:14:56 buildvm-aarch64-02 kernel: sysrq: CPU89:
Jul 17 15:14:56 buildvm-aarch64-02 kernel: Call trace:
Jul 17 15:14:56 buildvm-aarch64-02 kernel: dump_backtrace+0x0/0x160
Jul 17 15:14:56 buildvm-aarch64-02 kernel: show_stack+0x24/0x30
Jul 17 15:14:56 buildvm-aarch64-02 kernel: showacpu+0x8c/0xb8
Jul 17 15:14:56 buildvm-aarch64-02 kernel: flush_smp_call_function_queue+0x98/0x150
Jul 17 15:14:56 buildvm-aarch64-02 kernel: generic_smp_call_function_single_interrupt+0x18/0x20
Jul 17 15:14:56 buildvm-aarch64-02 kernel: handle_IPI+0x1c4/0x370
Jul 17 15:14:56 buildvm-aarch64-02 kernel: gic_handle_irq+0x13c/0x15c
Jul 17 15:14:56 buildvm-aarch64-02 kernel: el0_irq_naked+0x4c/0x54

sysrq t is...gigantic. https://infrastructure.fedoraproject.org/infra/tmp/dmesg-sysrq-l-20190717.xz

nothing in sysrq-w it seems.

After losing a day to an unrelated kernel-headers issue, https://koji.fedoraproject.org/koji/taskinfo?taskID=36348544 looks like it's in a similar state. Can you double check this is the same bad state? This is a bad week and I haven't had time to think more about what to debug or check :(

Similar, but not the same.

This time, no xz processes, just rpmbuild taking up 200% cpu.

I straced it and got:

[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0                                        [144/1807]
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 6370) = 6370
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/svcaut
h_gss.h", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 797) = 797
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/svcsoc
k.h", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 2140) = 2140
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/timer.
h", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 1172) = 1172
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xdr.h"
, O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 16360) = 16360
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprt.h
", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 15562) = 15562
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprtmu
ltipath.h", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 2051) = 2051
[pid 47840] close(3)                    = 0
[pid 47840] openat(AT_FDCWD, "/builddir/build/BUILDROOT/kernel-5.3.0-0.rc0.git7.1.fc31.aarch64/
usr/src/debug/kernel-5.2.fc31/linux-5.3.0-0.rc0.git7.1.fc31.aarch64/include/linux/sunrpc/xprtrd
ma.h", O_RDONLY) = 3
[pid 47840] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 47840] read(3, "/* SPDX-License-Identifier: GPL-"..., 3021) = 3021
...

some kind of loop in debuginfo generation?

https://koji.fedoraproject.org/koji/taskinfo?taskID=36348544 so this did eventually finish. I suspect there's an underlying issue with debuginfo generation. I'm going to close this ticket for now, we can reopen if we start hitting it again.

Metadata Update from @labbott:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata