#9232 Ticket to collect issues and problems with s390x builders
Opened a month ago by mtasaka. Modified a day ago

Currently s390 builders have very poor resource, lots of tasks are now waiting for f390 build to start. Roughly it is taking 2 hours or so for s390 build to start. It seems lots of s390 build-arch hosts are now disabled. Would you investigate the current status? Thank you.


The builders are on a shared server and those services are utilized by multiple groups. We do not have any ETA on when this will be fixed.

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: groomed, high-gain, medium-trouble, monitoring, ops

a month ago

So, we have done all we can on our end to make the s390x builders happy and reliable.

The problem is that they share hardware with other tennats and those have priority over us, so our builders get starved for cpu and io cycles.

The mainframe admins have asked us to collect problems or issues we see to better inform them of our issues.

So, I'd like to use this ticket to collect some of those issues.

If you get a failed build because of a s390x issue, please note the exact time and the builder name and the link to the koji task that failed.
If your build was delayed a long time, please note the koji task link and exact time here.

We will collect these and get them to upstream admins.

Thanks.

Metadata Update from @kevin:
- Issue untagged with: ops

a month ago

I've been asked to try and collect issues that package maintainers are
hitting on these builders and provide them to mainframe admins so they
can understand the impact on us and prioritize more resources or other
corrective measures.

To that end, if you are a maintainer and:

  • A build on s390x being slow affects you (needed for another build,
    important bug fix, etc) in a serious way

or

  • a build on a s390x builder fails in some odd way that is NOT related
    to your package (unable to download src.rpm, checksum mismatches, etc).

I'd love for you to note:

  • the link to the failed/slow task
  • The time (UTC preferred)
  • which exact builder it was
  • what the issue was
  • how it impacted your fedora work

For gcc, s390x is lately the significantly slowest arch. Part of it is the low %{?_smp_mflags} it offers (e.g. in several of the last few builds, all other arches gave -j5, -j6, -j8 or higher, while s390x only -j2), but that isn't the only factor. E.g. the previous gcc f33 build I had to cancel as it looked to be stuck on s390x after more than 30 hours - https://koji.fedoraproject.org/koji/taskinfo?taskID=50185070 - buildvm-s390x-22.s390.fedoraproject.org - Wed, 26 Aug 2020 11:38:51 UTC to Thu, 27 Aug 2020 17:56:46 UTC (the second slowest arch finished in less than 13 hours, on a fast box one can build gcc in 2 hours). After resubmitting the job, all other arches finished reasonably quickly, but s390x keeps building for more than 24 hours now (not stuck, but just very slow progress) - https://koji.fedoraproject.org/koji/taskinfo?taskID=50264059 buildvm-s390x-06.s390.fedoraproject.org Thu, 27 Aug 2020 17:59:11 UTC total time 24:20 right now and counting. Earlier this year, armv7hl used to be the slowest arch, but it no longer is.

Just FYI, I have been looking at that builder... here's a vmstat:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----                                   
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st                                   
 4  0  38920 2708604     12 4506840    0    0     0   124  198  518 11  9  1  0 80                                 
 3  0  38920 2712104     12 4506820    0    0     0    80  244  677 23 20  0  0 58                                 
 2  0  38920 2712964     12 4506920    0    0     0   108  253  837 27 22  1  0 51                                 
 2  0  38920 2705416     12 4506964    0    0     0    80  160  579  8 13  1  0 78                                 
 2  0  38920 2714308     12 4506948    0    0     0    76  182  565  8 13  3  0 75  

You can see the last col is 'stolen'... which basically means the hypervision is just not giving it many cpu cycles at all. ;(

I've been asked to try and collect issues

Including ones that are a few weeks old, or only new ones from now on? These were transient failures in the latest mass rebuild:

For both of the above:

  • "error reading package header"
  • I had to spend time investigating the failures, resubmitting the rebuilds, and fending off spurious FTBFS reports in Bugzilla.

https://koji.fedoraproject.org/koji/taskinfo?taskID=47965464
Started Tue, 28 Jul 2020 21:08:45 UTC
Host buildvm-s390x-10.s390.fedoraproject.org
"_rpm.error: error reading package header"

Took me two hours to read the source code of rpm python lib, trying to figure out what happened, since I don't really know what s390x arch is :)

Blender failed to build on s390x architecture prompting to temporarily
exclude it.
* Started: Mon, 24 Aug 2020 14:17:00 UTC
* Host: buildvm-s390x-05.s390.fedoraproject.org
* Build url: https://koji.fedoraproject.org/koji/taskinfo?taskID=50066919
* Build.log tail below

gmake[2]: Leaving directory '/builddir/build/BUILD/blender-2.83.5/s390x-redhat-linux-gnu'
gmake[2]: *** [source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/build.make:84: source/blender/makesrna/intern/rna_ID_gen.c] Aborted (core dumped)
gmake[1]: *** [CMakeFiles/Makefile2:5805: source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/all] Error 2
gmake[1]: Leaving directory '/builddir/build/BUILD/blender-2.83.5/s390x-redhat-linux-gnu'
gmake: *** [Makefile:185: all] Error 2

Blender failed to build on s390x architecture prompting to temporarily
exclude it.
Started: Mon, 24 Aug 2020 14:17:00 UTC Host: buildvm-s390x-05.s390.fedoraproject.org
* Build url: https://koji.fedoraproject.org/koji/taskinfo?taskID=50066919

this doesn't seem to be an infra problem, the makesrna tool is crashing

The s390x builders are running out of diskspace and I can't get cross-gcc builds to complete:

tasks: 50796290, 50942816, 50796717, 50942815 plus others I didn't record

The s390x builders are running out of diskspace and I can't get cross-gcc builds to complete:

tasks: 50796290, 50942816, 50796717, 50942815 plus others I didn't record

please see #9265

mainframe folks tell me that they identified one possible cause of our issues and remediated it (Our lpars were on 7200rpm disk instead of 'fast' disk and have now been moved to 'fast disk')

Please let us know if you see an issues now.

From a quick glance it does seem a lot better to me, but there's not too many builds happening right now.

Login to comment on this ticket.

Metadata