#1418 Issues unpacking src.rpm on fedoraproject koji on fedora builders
Closed: Fixed 2 years ago by tkopecek. Opened 3 years ago by kevin.

We have been hitting issues unpacking src.rpm's of late in the Fedora koji instance.

This happens on our s390x builders, possibly when they are under load or the remote link to them is slow/busy.

See:

https://pagure.io/fedora-infrastructure/issue/7753
https://pagure.io/fedora-infrastructure/issue/7581
https://pagure.io/fedora-infrastructure/issue/7729

For example, from https://koji.fedoraproject.org/koji/taskinfo?taskID=34567975
https://kojipkgs.fedoraproject.org//work/tasks/8008/34568008/root.log

DEBUG util.py:677: Executing command: ['/bin/rpm', '-Uvh', '--nodeps', '/builddir/build/originals/eclipse-4.11-7.module_f28+4131+bc1a8827.src.rpm'] with env {'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOME': '/builddir', 'HOSTNAME': 'mock', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'PROMPT_COMMAND': 'printf "\033]0;<mock-chroot>\007"', 'PS1': '<mock-chroot> \s-\v\$ ', 'LANG': 'C.UTF-8'} and shell False
DEBUG util.py:418: Unsharing. Flags: 134217728
DEBUG util.py:585: Updating / installing...
DEBUG util.py:585: eclipse-1:4.11-7.module_f28+4131+bc1a8########################################
DEBUG util.py:583: BUILDSTDERR: error: unpacking of archive failed on file /builddir/build/SOURCES/eclipse-platform-sources-4.11RC2.tar.xz;5cca4e57: cpio: read failed - Inappropriate ioctl for device
DEBUG util.py:583: BUILDSTDERR: error: /builddir/build/originals/eclipse-4.11-7.module_f28+4131+bc1a8827.src.rpm cannot be installed

I am not sure what could be done here, but it seems like the src.rpm it has is somehow corrupt as downloaded? The src.rpm in koji unpacks fine.

We are using a http proxy for packages, but does this src.rpm download happen via xmlrpc? Could perhaps the hub do a checksum of it and have a client check it after download and retry if it didn't match?

Happy to gather more info.

EDITED TO ADD: Looks like it also happens on armv7 builders:
https://koji.fedoraproject.org/koji/taskinfo?taskID=34573650
https://pagure.io/fedora-infrastructure/issue/7581


It's interesting to me that all of these reports are for failures to install the source package. I believe koji (or maybe it's mock at this point) could easily verify the package after downloading it and retry at that point before attempting to install it. It's not obvious how to do this, but

rpm2cpio foo.src.rpm | cpio -i --only-verify-crc

will at least fail on truncated archives (with cpio: premature end of file and a return code of 2). Of course, just looking at the file size might work just as well.

does this src.rpm download happen via xmlrpc?

No, just a regular httpd GET.

Metadata Update from @yulwang:
- Issue assigned to dgregor
- Issue tagged with: discussion, feature, groomed

3 years ago

@kevin @tibbs how confident are we that a retry will work? If the corruption is due to server or network load, retrying may make it worse. Is this something that has happened in the past as well or just recently?

@kevin @tibbs how confident are we that a retry will work? If the corruption is due to server or network load, retrying may make it worse. Is this something that has happened in the past as well or just recently?

Is retrying this one download more load that retrying the whole build? Because at the moment all that happens is I am forced to just resubmit my whole build.

Certainly if a retry doesn't work then we would have learned something useful. Currently I'm not sure that we have any reasonable way to know if the problem is in the download or if the SRPM simply wasn't written correctly at some earlier stage of the process, though I think evidence suggests the former since the SRPMs are rarely built in s390x machines while the problem is only seen there. If necessary, the check I posted earlier could be added at other places in the process to verify that the SRPM is OK at each step.

In the examples given above, the unpacking of the SRPM fails on s390x, but succeeds on all other arches. The fact that some arches succeed tells you that the SRPM itself is ok, no?

Another occurrance seen today:

https://koji.fedoraproject.org/koji/taskinfo?taskID=35592274

DEBUG util.py:587:  Updating / installing...
DEBUG util.py:587:  protobuf-3.6.1-6.module_f30+4730+ebb8f########################################
DEBUG util.py:585:  BUILDSTDERR: error: unpacking of archive failed on file /builddir/build/SOURCES/googletest-1.8.1.tar.gz;5d0773c9: cpio: read failed - Inappropriate ioctl for device
DEBUG util.py:585:  BUILDSTDERR: error: /builddir/build/originals/protobuf-3.6.1-6.module_f30+4730+ebb8f206.src.rpm cannot be installed
DEBUG util.py:734:  Child return code was: 1

This still seems to be happening. Do we know if it's only Z/VM s390x builder (01-14) or kvm ones (15-24)?

@sharkcz any ideas on how we might track this down?

Could this be some odd rpm or cpio bug on s390x?

Metadata Update from @kevin:
- Custom field Size adjusted to None

2 years ago

FWIW, I saw two more occurances recently:

https://koji.fedoraproject.org/koji/taskinfo?taskID=37763977
https://koji.fedoraproject.org/koji/taskinfo?taskID=37763055

The relevant part of the logs is from mock_output.log:

Start: build phase for eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm
Start: build setup for eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm
Finish: build setup for eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm
Finish: build phase for eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm
ERROR: Exception(/var/tmp/koji/tasks/3055/37763055/local/work/tasks/2985/37762985/eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm) Config(module-eclipse-latest-3120190920120950-d36a98d4-build-17552893-1266828) 0 minutes 0 seconds
INFO: Results and/or logs in: /var/lib/mock/module-eclipse-latest-3120190920120950-d36a98d4-build-17552893-1266828/result
ERROR: Command failed: 
 # /bin/rpm -Uvh --nodeps /builddir/build/originals/eclipse-4.13-2.module_f31+6519+12cd0b27.src.rpm

So from @mbooth's last report and my earlier observation it happens on both kvm and z/vm guests ...

@kevin, I don't recall seeing such problem when doing local builds, but can try again. The only known fact is that it affects large srpms ...

[sharkcz@devel3 firefox]$ truncate -s 340M firefox-69.0.1-3.fc32.src.rpm
[sharkcz@devel3 firefox]$ rpm -ivh firefox-69.0.1-3.fc32.src.rpm 
Updating / installing...
   1:firefox-69.0.1-3.fc32            ################################# [100%]
error: unpacking of archive failed on file /home/sharkcz/rpmbuild/SOURCES/firefox-langpacks-69.0.1-20190918.tar.xz;5d88c18b: cpio: read failed - No such file or directory
error: firefox-69.0.1-3.fc32.src.rpm cannot be installed

symptom is very similar, but the error from cpio is different

But when running mock rebuild (using nspawn chroot) against that shortened srpm, I get the same error

DEBUG util.py:593:  warning: group sharkcz does not exist - using root
DEBUG util.py:595:  #####
DEBUG util.py:593:  error: unpacking of archive failed on file /builddir/build/SOURCES/firefox-langpacks-69.0.1-20190918.tar.xz;5d88c45f: cpio: read failed - Inappropriate ioctl for device
DEBUG util.py:593:  error: /builddir/build/originals/firefox-69.0.1-3.fc32.src.rpm cannot be installed
DEBUG util.py:741:  Child return code was: 1
DEBUG util.py:327:  kill orphans

So I still suspect some infra issue causing incomplete download of the srpm.

Possibly, but why is it only src.rpm downloads? Are those done via xmlrpc?

IMHO the best (maybe only) thing we can is to add/enable some debugging/verification in the koji builder code, to capture more context before the unpacking failure (before even starting mock).

According to Mike's comment earlier in the thread, the downloads are just a regular httpd GET. Would it be possible to look at the web server logs to see if the byte size sent matches what's expected? Also, this is only happening on s390x, correct?

should be on armv7 too, but s390x is the most frequent one, but it's also the most distant (if there is a correllation)

So, the s390x builders all point to a local varnish cache, which in turn pulls from kojipkgs01/02 (the same servers used by other builders).

There's nothing I can see off in the kojipkgs logs.

So, perhaps the issue is in varnish...

I would start with something like this

diff --git a/builder/kojid b/builder/kojid
index d96a8ae5..7cba764b 100755
--- a/builder/kojid
+++ b/builder/kojid
@@ -1335,6 +1335,8 @@ class BuildArchTask(BaseBuildTask):
         fn = self.localPath("work/%s" % pkg)
         if not os.path.exists(fn):
             raise koji.BuildError("SRPM file missing: %s" % fn)
+        if not rpm_digest_ok(fn):
+            raise koji.BuildError("SRPM file corrupted: %s" % fn)
         # peel E:N-V-R from package
         h = koji.get_rpm_header(fn)
         name = koji.get_header_field(h, 'name')

and it could serve as a generally useful check, not specific for debugging this particular issue.

If truncating a file causes this, then I think there's also a defect in the error handling in cpio, because "Inappropriate ioctl for device" does not say "corrupted file" to me.

Inappropriate ioctl for device is when mock with nspawn is in between, No such file or directory is an error in "normal" case

I've observed another error in the same vein:

https://koji.fedoraproject.org/koji/taskinfo?taskID=37848374

It has the slightly different error of:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/koji/daemon.py", line 1295, in runTask
    response = (handler.run(),)
  File "/usr/lib/python3.7/site-packages/koji/tasks.py", line 311, in run
    return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
  File "/usr/lib/python3.7/site-packages/koji/util.py", line 263, in call_with_argcheck
    return func(*args, **kwargs)
  File "/usr/sbin/kojid", line 1291, in handler
    h = koji.get_rpm_header(fn)
  File "/usr/lib/python3.7/site-packages/koji/__init__.py", line 887, in get_rpm_header
    hdr = ts.hdrFromFdno(fo.fileno())
  File "/usr/lib64/python3.7/site-packages/rpm/transaction.py", line 186, in hdrFromFdno
    raise rpm.error("error reading package header")
_rpm.error: error reading package header

Actually I'm not sure if this one is related, but it happened on a s390x host, so...

https://koji.fedoraproject.org/koji/taskinfo?taskID=37858831

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/koji/daemon.py", line 1295, in runTask
    response = (handler.run(),)
  File "/usr/lib/python3.7/site-packages/koji/tasks.py", line 311, in run
    return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
  File "/usr/lib/python3.7/site-packages/koji/util.py", line 263, in call_with_argcheck
    return func(*args, **kwargs)
  File "/usr/sbin/kojid", line 971, in handler
    h = self.readSRPMHeader(srpm)
  File "/usr/sbin/kojid", line 1056, in readSRPMHeader
    with koji.openRemoteFile(relpath, **opts) as fo:
  File "/usr/lib/python3.7/site-packages/koji/__init__.py", line 1613, in openRemoteFile
    src = six.moves.urllib.request.urlopen(url)
  File "/usr/lib64/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib64/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Backend fetch failed

Task failed with

error: unpacking of archive failed on file /builddir/build/SOURCES/linphone-3.6.1.tar.gz;5d923bc7: cpio: read failed - Inappropriate ioctl for device

https://koji.fedoraproject.org/koji/taskinfo?taskID=37969762

@kevin Can you see something in varnish logs?

Another occurance:

https://koji.fedoraproject.org/koji/taskinfo?taskID=37981909

DEBUG util.py:595:  Updating / installing...
DEBUG util.py:595:  eclipse-cdt-2:9.9.0-1.module_f31+6676+########################################
DEBUG util.py:593:  error: unpacking of archive failed on file /builddir/build/SOURCES/org.eclipse.cdt-7e4f3a84db67f87399a41a35ba11ac591ab9117e.tar.xz;5d934a47: cpio: read failed - Inappropriate ioctl for device
DEBUG util.py:593:  error: /builddir/build/originals/eclipse-cdt-9.9.0-1.module_f31+6676+830996ad.src.rpm cannot be installed

And another:

https://koji.fedoraproject.org/koji/taskinfo?taskID=39324111

DEBUG util.py:598:  Updating / installing...
DEBUG util.py:598:  protobuf-3.6.1-6.module_f32+7102+8b113########################################
DEBUG util.py:596:  error: unpacking of archive failed on file /builddir/build/SOURCES/protobuf-3.6.1-all.tar.gz;5ddbe3b6: cpio: read failed - Inappropriate ioctl for device
DEBUG util.py:596:  error: /builddir/build/originals/protobuf-3.6.1-6.module_f32+7102+8b1132d5.src.rpm cannot be installed
DEBUG util.py:744:  Child return code was: 1
DEBUG util.py:329:  kill orphans
DEBUG util.py:763:  child environment: None

Is anyone looking at this? I'm happy to keep reporting occurrences when they happen.

On our side, we've added #290 to 1.20 release roadmap. It should at least find some problems, but it is probably only a subset of this issue. For fixing this, I would work more with fedora infra (@kevin ?)t, as it seems not to be related to koji itself, but to some infra setup.

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.20

2 years ago

Metadata Update from @julian8628:
- Issue marked as depending on: #290

2 years ago

Metadata Update from @dgregor:
- Issue assigned to tkopecek (was: dgregor)

2 years ago

This should be fixed with #290

Metadata Update from @tkopecek:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @jcupova:
- Issue tagged with: testing-done

2 years ago

Hey all, I saw this again today, but I don't have permission to reopen the ticket so I filed a new one at #1974

Login to comment on this ticket.

Metadata