Since yesterday, all of dev, stg and prod started irregularly (once every few hours) failing with this error:
[libtaskotron:vm.py:104] 2018-03-02 08:44:23 DEBUG spawning testcloud instance taskotron-e4daa084-1df5-11e8-80ca-525400ee7c53 [testcloud.instance:instance.py:415] 2018-03-02 08:44:23 DEBUG Creating instance taskotron-e4daa084-1df5-11e8-80ca-525400ee7c53 libvirt: DBus Utils error : File exists [libtaskotron:logger.py:88] 2018-03-02 08:44:25 CRITICAL Traceback (most recent call last): File "/usr/bin/runtask", line 11, in <module> load_entry_point('libtaskotron==0.5.0', 'console_scripts', 'runtask')() File "/usr/lib/python2.7/site-packages/libtaskotron/main.py", line 200, in main finished = executor.execute() File "/usr/lib/python2.7/site-packages/libtaskotron/executor.py", line 317, in execute ipaddr = self._spawn_vm(self.arg_data['uuid']) File "/usr/lib/python2.7/site-packages/libtaskotron/executor.py", line 58, in _spawn_vm self.task_vm.prepare(**env) File "/usr/lib/python2.7/site-packages/libtaskotron/ext/disposable/vm.py", line 141, in prepare self._prepare_instance(tc_image) File "/usr/lib/python2.7/site-packages/libtaskotron/ext/disposable/vm.py", line 106, in _prepare_instance tc_instance.start() File "/usr/lib/python2.7/site-packages/testcloud/instance.py", line 417, in start create_status = dom.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1062, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: File exists
http://taskotron-dev.fedoraproject.org/taskmaster/builders/x86_64/builds/878181/steps/runtask/logs/stdio
@lbrabec believes that the error comes from libvirt calling systemd over dbus: https://github.com/libvirt/libvirt/blob/master/src/util/virsystemd.c#L325
As it turns out, the machines received a systemd update yesterday:
[root@qa11 ~][PROD]# dnf history info 68 Transaction ID : 68 Begin time : Thu 01 Mar 2018 08:47:33 PM UTC Begin rpmdb : 897:165ec23fbf00bf2c54183c3a81d74498f1629406 End time : Thu 01 Mar 2018 08:47:46 PM UTC (13 seconds) End rpmdb : 897:61d3abdb617c21a6039fd0dbacd05639ca811a74 User : System <unset> Return-Code : Success Transaction performed with: Installed dnf-2.7.5-2.fc27.noarch @updates Installed rpm-4.14.0-2.fc27.x86_64 @anaconda Packages Altered: Upgraded python2-crypto-2.6.1-19.fc27.x86_64 @fedora-27-updates Upgrade 2.6.1-22.fc27.x86_64 @updates Upgraded systemd-234-9.fc27.x86_64 @fedora-27-updates Upgrade 234-10.git5f8984e.fc27.x86_64 @updates Upgraded systemd-container-234-9.fc27.x86_64 @updates Upgrade 234-10.git5f8984e.fc27.x86_64 @updates Upgraded systemd-libs-234-9.fc27.x86_64 @fedora-27-updates Upgrade 234-10.git5f8984e.fc27.x86_64 @updates Upgraded systemd-pam-234-9.fc27.x86_64 @fedora-27-updates Upgrade 234-10.git5f8984e.fc27.x86_64 @updates Upgraded systemd-udev-234-9.fc27.x86_64 @fedora-27-updates Upgrade 234-10.git5f8984e.fc27.x86_64 @updates Upgraded unbound-libs-1.6.8-1.fc27.x86_64 @updates Upgrade 1.6.8-6.fc27.x86_64 @updates Scriptlet output: 1 Running as unit: run-r87f4b8db0c024c2ea740fb5ac221a8b8.service
We rebooted taskotron-dev (qa11) to see whether that would fix the problem. If it doesn't, we can try to downgrade systemd, and if that helps, we'll need to file a bug against it. Also, it would be helpful to figure out whether we can somehow enable more debug logs from the systemd dbus service.
The errors haven't occurred on dev since reboot, so we rebooted stg as well and waiting to see what happens.
Production rebooted, waiting for errors now.
This seems fixed now, reboots helped.
Metadata Update from @kparal: - Issue close_status updated to: Fixed
Login to comment on this ticket.