https://koji.fedoraproject.org/koji/taskinfo?taskID=40438531 and https://koji.fedoraproject.org/koji/buildinfo?buildID=1428576
Kevin Fenzi noted the last time this happened that ceph builds take ~50G of space.
see https://pagure.io/fedora-infrastructure/issue/8518
Looking over the builders, about 80 of the 160 do not have enough disk space to do a ceph build with them only having <50 GB of disk space free. Most of the builds on them are all within the last 24 hours so this is just everyone firing off big disk builds and running into each other.
I do not have a solution that can be done over a weekend or without a lot of planning on releng policies versus infrastructure resources. As of
Sun Jan 12 18:44:08 UTC 2020 buildvm-09.phx2.fedoraproject.org /dev/vda2 128G 119G 9.3G 93% / buildvm-05.phx2.fedoraproject.org /dev/vda2 128G 102G 27G 80% / buildvm-04.phx2.fedoraproject.org /dev/vda2 128G 104G 25G 81% / buildvm-23.phx2.fedoraproject.org /dev/vda2 128G 119G 9.4G 93% / buildvm-aarch64-12.arm.fedoraproject.org /dev/vda4 129G 113G 8.7G 93% / buildvm-aarch64-19.arm.fedoraproject.org /dev/vda4 129G 113G 9.2G 93% / buildvm-ppc64le-04.ppc.fedoraproject.org /dev/vda5 136G 121G 8.8G 94% / buildvm-s390x-19.s390.fedoraproject.org /dev/vda3 96G 76G 16G 83% / buildvm-s390x-18.s390.fedoraproject.org /dev/vda3 96G 84G 7.4G 92% / buildvm-s390x-24.s390.fedoraproject.org /dev/vda3 96G 83G 7.7G 92% /
these are the worst system offenders.
JFYI https://koji.fedoraproject.org/koji/taskinfo?taskID=40345807 / https://koschei.fedoraproject.org/build/7569428 / https://koschei.fedoraproject.org/package/pypy3?collection=f32 failed with missing space on aarch64.
Since we not seeing this in the past, I wonder if what happened is that the restarts over the holidays caused the kojid's on the builders to loose track of which old buildroots they had.
So, I went and manually removed them all, then restarted kojid on all the builders.
Lets see if that solves the issue. If not, we will have to examine more closely what is in the buildroots that are left on the builders that are full and explore whats happening with koji upstream.
Please re-open or file a new ticket if you see any of them hitting this again.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
OK I am going to reopen because I checked this morning and some of the builders are at the place they were yesterday. The problem is most pronounced on the s390 boxes
buildvm-s390x-01.s390.fedoraproject.org 0 buildvm-s390x-02.s390.fedoraproject.org 0 buildvm-s390x-03.s390.fedoraproject.org 0 buildvm-s390x-04.s390.fedoraproject.org 0 buildvm-s390x-05.s390.fedoraproject.org 0 buildvm-s390x-06.s390.fedoraproject.org 0 buildvm-s390x-07.s390.fedoraproject.org 0 buildvm-s390x-08.s390.fedoraproject.org 0 buildvm-s390x-09.s390.fedoraproject.org 0 buildvm-s390x-10.s390.fedoraproject.org 0 buildvm-s390x-11.s390.fedoraproject.org 0 buildvm-s390x-12.s390.fedoraproject.org 1 buildvm-s390x-13.s390.fedoraproject.org 1 buildvm-s390x-14.s390.fedoraproject.org 0 buildvm-s390x-15.s390.fedoraproject.org 0 buildvm-s390x-16.s390.fedoraproject.org 2 buildvm-s390x-17.s390.fedoraproject.org 1 buildvm-s390x-18.s390.fedoraproject.org 95 buildvm-s390x-19.s390.fedoraproject.org 62 buildvm-s390x-20.s390.fedoraproject.org 59 buildvm-s390x-21.s390.fedoraproject.org 104 buildvm-s390x-22.s390.fedoraproject.org 75 buildvm-s390x-23.s390.fedoraproject.org 82 buildvm-s390x-24.s390.fedoraproject.org 58
Those last 7 boxes seemed to have gotten most of the builds and so two of them are almost out of build space. On other arches it is spread out but still a few seem to have more used and they seem to be ones I saw high yesterday.
buildvm-aarch64-08.arm.fedoraproject.org /dev/vda4 129G 49G 73G 41% / buildvm-aarch64-18.arm.fedoraproject.org /dev/vda4 129G 57G 65G 47% / buildvm-ppc64le-12.ppc.fedoraproject.org /dev/vda5 136G 58G 72G 45% /
Metadata Update from @smooge: - Issue status updated to: Open (was: Closed)
Thats expected, but is it a problem?
kojid keeps track of buildroots on it's builder and currently cleans them up on successfull build or (after 8 hours) failed builds...
we cannot keep them 0% full all the time and still be able to look at failed buildroots.
Metadata Update from @smooge: - Issue priority set to: Waiting on Assignee (was: Needs Review)
Well if those are failed builds.. why are they only showing up on 7 builders? I would have thought they would show up spread over the 24 in some other pattern as they are spread out over the ppc64le, aarch64, armv7 and x86_64 much more.
Because those 7 hosts have higher "weight" because they are kvm vs 01-14 which are zvm instances. (4.0 vs 2.0). so, koji gives them more jobs than the others. Since most everything Fedora also does a s390x build, it means those 7 builders are doing most every fedora build. I guess we could move more to the zvm builders, but they were the ones showing the weird 'unable to unpack src.rpm' on heavy load.
Or we could drop the other builders from the list if they aren't getting used? Since i found a non-problem.. I think we can close this.
no, this is still a problem it seems.
So, I set all the s390x builders to have a weight of 3.0. So, it should spread over them better now and not cluster on the kvm ones and filling up their disks.
Login to comment on this ticket.