#7355 losetup issues causing createLiveMedia tasks to fail
Closed: Insufficient data 6 years ago Opened 6 years ago by dustymabe.

This has been happening quite a bit lately, but doesn't happen every time so it seems there are some outside factors involved.

DEBUG util.py:439:  subprocess.CalledProcessError: Command '['losetup', '--find', '--show', '/thebuilddir/tmp/lmc/lmc-disk-yfdl91yg.img']' returned non-zero exit status 1.

Some tasks where this failed in the Fedora-Rawhide-20180227.n.0 compose:

Some theories are:

  • running out of loopback devices on builder machines
  • the lmc-disk-yfdl91yg.img tmp file actually not being there

@puiterwijk tracked down one instance and saw this error message:

losetup: /thebuilddir/tmp/lmc/lmc-disk-bh6adbnt.img: failed to set up loop device: No such file or directory

we need to figure this out as it's going to cause us real pain over time.


So the way the file ultimately gets created is this:

  1. make_image in livemedia-creator gets a temporary file name by calling disk_img = tempfile.mktemp(prefix="lmc-disk-", suffix=".img", dir=opts.result_dir) - note this does NOT create the file
  2. It also gets a size for the image file by calling disk_size = calculate_disk_size(opts, ks)
  3. It then passes both disk_img and disk_size to novirt_install
  4. novirt_install passes disk_img and disk_size through to imgutils.mkext4img, which is a wrapper around imgutils.mkfsimage, which calls them outfile and size
  5. mkfsimage passes them to the LoopDevcontext class, which calls them filename and size
  6. LoopDev.__init__ calls mksparse(self.filename, size) (we always have a 'size' on this path, I'm pretty sure)
  7. mksparse does its thing

What mksparse does is this:

def mksparse(outfile, size):
    '''use os.ftruncate to create a sparse file of the given size.'''
    fobj = open(outfile, "w")
    os.ftruncate(fobj.fileno(), size)

So I'm wondering whether it's possible for that to basically fail quite silently and not actually create the file. I also wonder if it's a bug that it doesn't close the object before returning.

On this:

"I also wonder if it's a bug that it doesn't close the object before returning."

Technically it is, because this is relying on an implementation detail of CPython. But practically speaking, it shouldn't be. CPython does reference counting-based garbage collection, and part of its garbage collection is closing file objects, so fobj will get closed at the end of the function (as the last reference to it goes away then).

Note: current compose - Fedora-Rawhide-20180228.n.0 - is running with a slightly patched lorax that should log whether the temp image file exists before and after mksparse is run. I wanted to see the results from this before I went to bed but the compose hasn't quite hit livemedia phase yet :/ Will look in the morning.

On this:
"I also wonder if it's a bug that it doesn't close the object before returning."
Technically it is, because this is relying on an implementation detail of CPython. But practically speaking, it shouldn't be. CPython does reference counting-based garbage collection, and part of its garbage collection is closing file objects, so fobj will get closed at the end of the function (as the last reference to it goes away then).

While that is all true, I don't believe you can/should expect that the GC will reap the file handle immediately upon exit of the function. I'm quite certain the implementation reserves the right to do so whenever it feels best, e.g., free mem pressure, program exit , etc.. This could explain the intermittent nature of the problem quite easily.

I always prefer to use a context manager for these types of things:

with open(outfile, "w") as fobj:
    os.ftruncate(fobj.fileno(), size)

It's just as compact, but does guarantee fobj is closed immediately upon exit of the with scope.

Still don't think that's a plausible candidate, as the file handle doesn't actually have to be closed before the file exists. If I do this in a console:

>>> import os
>>> fh = open('/tmp/foofoofoo', 'w')
>>> os.ftruncate(fh.fileno(), 8192)

switch to another tab and check if /tmp/foofoofoo exists (it didn't before I started, obviously)...it does.

That's the behavior I would expect. I mostly wanted to just add my own knowledge about the subtle behavioral quirks of closing/GC .

This seems to have sorta mysteriously disappeared in more recent compose attempts, but there's so much churn in composes ATM I'm a bit reluctant to close the issue yet.

Let's just close this now. For whatever reason, doesn't seem to be affecting us any more. Can someone close it? I can't.

Metadata Update from @kevin:
- Issue close_status updated to: Insufficient data
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata