Issue #2026: Assure that /var/cache/mock has enough space - copr

copr / copr

#2026 Assure that /var/cache/mock has enough space

Closed: Fixed 2 years ago by praiskup. Opened 2 years ago by churchyard.

For a few builds in a row in https://copr.fedorainfracloud.org/coprs/churchyard/prompt-toolkit-3.0.24-f35/package/python-nbconvert/

I got:

[SKIPPED] texlive-williams-svn15878.0-45.fc35.noarch.rpm: Already downloaded   
[MIRROR] texlive-willowtreebook-svn54866-45.fc35.noarch.rpm: Curl error (23): Failed writing received data to disk/application for https://kojipkgs.fedoraproject.org/repos/f35-build/latest/x86_64/toplink/packages/texlive/2021/45.fc35/noarch/texlive-willowtreebook-svn54866-45.fc35.noarch.rpm [Failure writing output to destination]
[FAILED] texlive-willowtreebook-svn54866-45.fc35.noarch.rpm: Curl error (23): Failed writing received data to disk/application for https://kojipkgs.fedoraproject.org/repos/f35-build/latest/x86_64/toplink/packages/texlive/2021/45.fc35/noarch/texlive-willowtreebook-svn54866-45.fc35.noarch.rpm [Failure writing output to destination]
(4518-4519/4694): te 96% [=================== ]  41 MB/s | 2.4 GB     00:01 ETA
Error: Error downloading packages:
  Curl error (23): Failed writing received data to disk/application for https://kojipkgs.fedoraproject.org/repos/f35-build/latest/x86_64/toplink/packages/texlive/2021/45.fc35/noarch/texlive-willowtreebook-svn54866-45.fc35.noarch.rpm [Failure writing output to destination]

Is it possible that the builder has not enough space?

This is a fedora-35-x86_64 chroot with http://kojipkgs.fedoraproject.org/repos/f35-build/latest/$basearch/ added as additional repo, python-nbconvert@f35 Fedora's dist-git.

churchyard commented 2 years ago

Removing the f35-build Koji repo helps. I suspect this might be a problem with that repo, but OTOH it says "Failed writing received data to disk".

praiskup commented 2 years ago

Thanks for the report.

Could this be some random builder failure? Or were you able to reproduce
this? (note that the same builder may be used repeatedly, if the builds are
submitted quickly enough after each other)

praiskup commented 2 years ago

For a few builds in a row

Ok, this was not a random failure.

Have you tried to reproduce this locally? I'm afraid this is unlikely a problem in Mock.
Bootstrap was ON in that build, so such builds are sensitive to any pre-release bugs
in DNF/RPM stack.

Also, the package which fails was from the Koji "local" build repo.... and from what
I've heard, Koji http server might cut some connections if the server is overloaded,
in peak situations (though I'd expect a different message, this rather looks like
client writing issue than reading).

praiskup commented 2 years ago

Ok, this is small / partition problem.

The /var/lib/mock data go to Mock's tmpfs mount point, but /var/lib/cache don't (stay on /). And our hypervisor worker that failed for me has this:

[root@fedora ~]# du --max-depth 1 /var/cache/mock -h
2.8G    /var/cache/mock/fedora-35-x86_64
2.8G    /var/cache/mock
[root@fedora ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        1.9G     0  1.9G   0% /dev
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           783M  8.6M  775M   2% /run
/dev/vda1       4.9G  4.4G  284M  94% /
tmpfs           2.0G     0  2.0G   0% /tmp
/dev/sr0        364K  364K     0 100% /config
/dev/vdb1        16G   45M   15G   1% /var/lib/copr-rpmbuild
tmpfs           392M     0  392M   0% /run/user/0

We have several kinds of builders, e.g. AWS builders that have larger / partitions.
But AWS builders are only used if our on-premise builders are already in use. And
use of AWS builders can not be enforced by user.

So, the reason why disabling local helps is that the local metadata don't have to be
stored in /var/cache.

We need to use tmpfs for /var/cache/mock.

Metadata Update from @praiskup:
- Issue tagged with: bug

2 years ago

churchyard commented 2 years ago

BTW I've started 3 of the builds rapidly after each other to exclude the possibility this is just one rogue builder.

Metadata Update from @praiskup:
- Issue assigned to schlupov

2 years ago

praiskup commented 2 years ago

As pointed out by @schlupov, we already had: https://pagure.io/fedora-infra/ansible/pull-request/110

So now we should enable this for all bulider VMs.

Metadata Update from @praiskup:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)