#2714 kojira rmtree jobs can stall due to locking race
Closed: Fixed 3 years ago by tkopecek. Opened 3 years ago by mikem.

I don't think this will happen with current versions of python, but on py3.6 at least I'm seeing behavior where kojira's rmtree jobs will randomly stall on a lock and remain that way forever.

I believe this is due to the combination of threads and os.fork use in kojira. Every fork (i.e. when kojira starts an rmtree job), there is a chance that another thread holds the logging lock. If it does, then the child process is forever unable to acquire the logging lock. Over time, kojira will fill up to its limit with stalled rmtree jobs and never progress.

This was previously not an issue for us because the rmtree jobs did not perform logging, or at least not often. With changes in the past year, they do so more regularly.


This is what a backtrace for such a stalled process looks like:

(gdb) py-bt
Traceback (most recent call first):
  <built-in method acquire of _thread.RLock object at remote 0x7f84164dc6f0>
  File "/usr/lib64/python3.6/logging/__init__.py", line 814, in acquire
    self.lock.acquire()
  File "/usr/lib64/python3.6/logging/__init__.py", line 863, in handle
    self.acquire()
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1320, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/koji/util.py", line 439, in rmtree
    logger.warning("No such file/dir %s for removal" % path)
  File "./kojira", line 348, in _rmtree
    rmtree(path)
  File "./kojira", line 318, in checkQueue
    pid = self._rmtree(path)
  File "./kojira", line 565, in rmtreeLoop
    self.checkQueue()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()

Metadata Update from @mikem:
- Custom field Size adjusted to None

3 years ago

It appears that in python3.8 (and maybe 3.7) on, the logging module addresses this by using os.register_at_fork (added in 3.7) to call logging._acquireLock(). So, I expect this bug will not appear on more current python versions. However, 3.6 has no such workaround in place.

PR #2715 works around this issue by acquiring logging locks before forking (and releasing them after).

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.25

3 years ago

Metadata Update from @jcupova:
- Issue tagged with: testing-ready

3 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.24.1 (was: 1.25)

3 years ago

Metadata Update from @tkopecek:
- Issue untagged with: testing-ready
- Issue tagged with: no_qe

3 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #2715 Merged 3 years ago