#9220 Failure to start build during mass rebuild
Closed: Fixed 3 years ago by mohanboddu. Opened 4 years ago by jjames.

  • Describe the issue
    No build was started for the GAPDoc package during the mass rebuild. According to https://koji.fedoraproject.org/koji/taskinfo?taskID=41124751 the SRPM was built, but then RPM was unable to read the headers. I notice that the SRPM build machine was s390x, so this may have been part of the general s390x wonkiness we saw at the beginning of the mass rebuild.
  • When do you need this? (YYYY/MM/DD)
    N/A
  • When is this no longer needed or useful? (YYYY/MM/DD)
    N/A
  • If we cannot complete your request, what is the impact?
    Nothing. I'll just manually build the package.

Something similar happened with GtkAda. A release bump was committed to Git on 2020-01-28. After the mass rebuild was finished and merged, no build attempt had been made according to the Koji web interface. I ran "fedpkg build" (to get it installable again after the GCC upgrade), and it was built without any problem.

I can't see any indication that even an SRPM was built for GtkAda. On the other hand I can't figure out how to find the task that built an SRPM for GAPDoc either (except through the direct link), so the same thing may or may not have happened with GtkAda.

It looks like the same thing happened to Macaulay2: https://koji.fedoraproject.org/koji/taskinfo?taskID=41124983. "GenericError: rpm's header can't be extracted: 5 (rpm error: error reading package header)"

More:
GeoIP: https://koji.fedoraproject.org/koji/taskinfo?taskID=41124811
GeoIP-GeoLite-data: https://koji.fedoraproject.org/koji/taskinfo?taskID=41124812
GeographicLib: https://koji.fedoraproject.org/koji/taskinfo?taskID=41124826
GitPython: https://koji.fedoraproject.org/koji/taskinfo?taskID=41124828

I was trying to find GtkAda, @rombobeorn, but it looks like I haven't tried high enough numbers yet. :-) The machine creating the SRPM was s390x in every case I have found so far.

The problem here is that koji/our messaging plugin doesn't emit any messages on these. The build never 'started' so it couldn't fail.

We will need to find some way to identify them. (database query?)

@mohanboddu @tkopecek any ideas?

Yep, it makes sense. If rebuilt SRPM is corrupted, build was never started (nvr can be extracted only in next step).
Do we have some of these rebuilt SRPMs to check, if they are really corrupted somehow? Could it be, that some older rpm binary is not able to read them in hub/builders?

It looks, that it fails only if parent task is on s390.

Yes, it seems like it's some kind of i/o issue on s390x builders... sometimes... :(

So, how can we identify these... failed in the f32-rebuild tag and never made a srpm?

db is easiest here - first guess

SELECT parent FROM task WHERE parent in (
    SELECT id FROM task WHERE
        method = 'build' AND
        state = 5 AND
        create_time > NOW() - '30 days'::interval
        request NOT LIKE '%<name>scratch</name>%' AND
        request LIKE '%<string>f32-rebuild</string>%'
    )
GROUP BY
    parent
HAVING
    count(*) = 1;

It should return all non-scratch build tasks in f32-rebuild target whose failed on first subtask (making srpm).

There is a AND missing on line 5? with that:

koji=# SELECT parent FROM task WHERE parent in (
koji(#    SELECT id FROM task WHERE
koji(#         method = 'build' AND
koji(#         state = 5 AND
koji(#         create_time > NOW() - '30 days'::interval AND
koji(#         request NOT LIKE '%<name>scratch</name>%' AND
koji(#         request LIKE '%<string>f32-rebuild</string>%'
koji(#     )
koji-# GROUP BY
koji-#     parent
koji-# HAVING
koji-#     count(*) = 1;
 parent 
--------
(0 rows)

Sorry for that - yes, it is missing. Query returns valid values to me (in brew). Anyway, it looks, that one month window is not long enough here. Linked builds are from January. So, play with create_time condition (mabye 3 months?)

ah yes, you're quite right...

so, 88 of them:

41163489
41128508
41124991
41216515
41282491
41208754
41171915
41128598
41257389
41124797
41212870
41188703
41248523
41124830
41192152
41124812
41124971
41181342
41243740
41236411
41124884
41268615
41124811
41268468
41321271
41124904
41124913
41124828
41257442
41124826
41152509
41125023
41140043
41189544
41237728
41315319
41125030
41135928
41196953
41251003
41124981
41257541
41315321
41218716
41214117
41196229
41132037
41321416
41124847
41124979
41124963
41124751
41271782
41124926
41196371
41124993
41263502
41239903
41125947
41125060
41196983
41124943
41187518
41144956
41206675
41276909
41124958
41265030
41191987
41124983
41318331
41285715
41199688
41125040
41182601
41186184
41263765
41124940
41129028
41206613
41321621
41125939
41124399
41125038
41214085
41124889
41151661
41278236
(88 rows)

so, probibly, we want to make a new tag (f32-srpm-failures?) rebuild everything in there, then merge it into f32-updates-testing-candidate and make a big update with them?
(The merge should filter out the ones newer already in the tag I think)

At this point of the release cycle, that could cause more damage than good.

I suspect we could: convert the list to package names. Verify packages that build after the mass rebuilt, open bugzillas to those that didn't, let the maintainers decide whether a rebuild is worth it now or not.

I suppose. I wouldn't think a rebuild would be a problem now, and if it is it would be better than we find that out now rather than after release... but sure, we can go the bug route.

Anyhow, here's the list of packages if someone wants to check them for newer than mass rebuild builds:

0ad
abcde
aboot
aespipe
asterisk
cava
Charliecloud
CPUFreqUtility
cryptlib
eigen3
GAPDoc
GarminPlugin
gdeploy
GeographicLib
GeoIP
GeoIP-GeoLite-data
gfalFS
GitPython
golanghub-boombuler-barcode.git
golanghub-xo-dburl.git
GoldenCheetah
GtkAda
InsightToolkit
Io-language
Java-WebSocket
jrnl
JSCookMenu
js-php-date-formatter
kernel-headers
LabPlot
LaTeXML
lensfun
L-function
libcorrect
libgeotiff
libjpeg-turbo
LibRaw
libtpms
libvirt
LinLog
Lmod
Macaulay2
MagicPoint
Mars
mediawiki-backtick-code
mesos
mingw-eigen3
mingw-expat
morse2txt
MUMPS
MUSIC
NearTree
NetworkManager
NetworkManager-fortisslvpn
NetworkManager-iodine
NetworkManager-openvpn
nodejs-info-symbol
nodejs-interpret
nodejs-oauth
nodejs-win-spawn
ocaml-dbus
ocaml-fileutils
osbs-client
perl-AutoXS-Header
perl-XML-DOM
php-channel-phpdoc
php-pecl-mcrypt
psblas3
python3-gssapi
python3-prctl
python-fields
python-moksha-hub
pyusb
R-car
R-chron
rubygem-asciidoctor
rubygem-rack-attack
rubygem-rails-dom-testing
rubygem-ruby-hmac
rubygem-xpath
rust-block-padding
rust-osstrtools
rust-pest_generator
sagemath
swtpm
tix
weechat
xvarstar

This is not needed anymore. Closing the ticket.

Metadata Update from @mohanboddu:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata