#7784 "deadlock detected" during build, causing build to fail
Closed: Upstream 4 years ago by kevin. Opened 4 years ago by mbooth.

Error that caused the build to fail:

<class 'psycopg2.extensions.TransactionRollbackError'>: deadlock detected
DETAIL:  Process 8928 waits for ShareLock on transaction 3416696641; blocked by process 8054.
Process 8054 waits for ShareLock on transaction 3416696834; blocked by process 8928.
HINT:  See server log for query details.

This happened during the i686 build of package "jna" for example:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34770460


Might be a mbs issue... adding @jkaluza and factory2

Metadata Update from @kevin:
- Issue assigned to jkaluza
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: factory2, mbs

4 years ago

Nothing MBS can do should be able to cause a database-level deadlock. This is definitely a Koji issue.

@kevin Please unassign @jkaluza . @mikem care to take a look?

Done. Sorry about that...

Metadata Update from @kevin:
- Assignee reset
- Issue untagged with: factory2, mbs
- Issue tagged with: koji

4 years ago

So did anyone see the "server log for query details" ? I obviously am unable to view server logs...

The postgres logs have:

ERROR:  deadlock detected
DETAIL:  Process 8928 waits for ShareLock on transaction 3416696641; blocked by process 8054.
        Process 8054 waits for ShareLock on transaction 3416696834; blocked by process 8928.
        Process 8928: INSERT INTO rpminfo (arch, build_id, buildroot_id, buildtime, epoch, external_repo_id, id, name, payloadhash, release, size, version) VALUES ('noarch', NULL, NULL, 1548964101, NULL, 33, 17523895, 'findbugs', '5242e8fb6c9cc4e9cb023bd01e2d549d', '17.fc30', 8430603, '3.0.1')
        Process 8054: INSERT INTO rpminfo (arch, build_id, buildroot_id, buildtime, epoch, external_repo_id, id, name, payloadhash, release, size, version) VALUES ('noarch', NULL, NULL, 1550078283, NULL, 33, 17523897, 'reflections', 'e65c0823223595b7553900b26c4d7b7d', '8.fc30', 152541, '0.9.10')

Perhaps a missing lock and two things finished right at exactly the same second?

Can I get the following?

  • The full kojid traceback from kojid.log on the host that was running the buildArch task
  • The full hub traceback from the httpd logs

The buildArch task doesn't import the rpms it generates (that happens in the parent task later), but it can end up adding external rpm entries (to record the contents of the buildroot when external repos are used). I guess there is potential for deadlock there with the implicit locks from dml operations.

How often does this happen?

May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: 2019-05-10 19:03:34,833 [WARNING] {40102} koji
.TaskManager:1304 FAULT:
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Traceback (most recent call last):
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1295, in runTask
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     response = (handler.run(),)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 311, in run
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/util.py", line 263, in call_with_argcheck
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return func(*args, **kwargs)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/sbin/kojid", line 1309, in handler
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     broot.build(fn,arch)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/sbin/kojid", line 592, in build
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     self.session.host.updateBuildRootList(self.id,self.getPackageList())
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2055, in __call__
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return self.__func(self.__name, args, opts)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2546, in _callMethod
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     raise err
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Fault: <Fault 1: "<class 'psycopg2.extensions.TransactionRollbackError'>: deadlock detected
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: DETAIL:  Process 8928 waits for ShareLock on transaction 3416696641; blocked by process 8054.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Process 8054 waits for ShareLock on transaction 3416696834; blocked by process 8928.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: HINT:  See server log for query details.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: ">
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: WARNING:koji.TaskManager:FAULT:
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Traceback (most recent call last):
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1295, in runTask
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     response = (handler.run(),)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 311, in run
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/util.py", line 263, in call_with_argcheck
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return func(*args, **kwargs)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/sbin/kojid", line 1309, in handler
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     broot.build(fn,arch)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/sbin/kojid", line 592, in build
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     self.session.host.updateBuildRootList(self.id,self.getPackageList())
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2055, in __call__
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     return self.__func(self.__name, args, opts)
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:   File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2546, in _callMethod
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]:     raise err
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Fault: <Fault 1: "<class 'psycopg2.extensions.TransactionRollbackError'>: deadlock detected
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: DETAIL:  Process 8928 waits for ShareLock on transaction 3416696641; blocked by process 8054.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: Process 8054 waits for ShareLock on transaction 3416696834; blocked by process 8928.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: HINT:  See server log for query details.
May 10 19:03:34 buildhw-07.phx2.fedoraproject.org kojid[1595]: ">
May 10 19:03:43 buildhw-07.phx2.fedoraproject.org kojid[1595]: ERROR:root:https://kojipkgs.fedoraproject.org///repos/module-eclipse-2019-03-2920190510184754-508ead3c-build/1160672/x86_64

I'm afraid the hub logs have rotated off already. :(

This is the only case I am aware of of this...

This is reminiscent of https://pagure.io/koji/issue/788, but we put in what I thought was a pretty solid workaround for that.

..... unless your hub is still 1.15, but I can't imagine it is....

Hmm, I see. The savepoint we added doesn't help here because it re-raises if the entry is still missing, which is likely the case because the other transaction isn't finished yet.

I expect it takes a pretty rare race to trigger this. Can someone file this against Koji if they haven't already? Something like the following may help:

https://github.com/mikem23/koji-playground/commits/br-list-deadlock

I wouldn't rush to apply a fix though, unless you've seen this more than once.

Opened https://pagure.io/koji/issue/1450 upstream to track this.

Thanks!

:koala:

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata