#257 tasks crash irregularly with "libvirtError: File exists"
Closed: Fixed 6 years ago Opened 6 years ago by kparal.

Since yesterday, all of dev, stg and prod started irregularly (once every few hours) failing with this error:

[libtaskotron:vm.py:104] 2018-03-02 08:44:23 DEBUG   spawning testcloud instance taskotron-e4daa084-1df5-11e8-80ca-525400ee7c53
[testcloud.instance:instance.py:415] 2018-03-02 08:44:23 DEBUG   Creating instance taskotron-e4daa084-1df5-11e8-80ca-525400ee7c53
libvirt: DBus Utils error : File exists
[libtaskotron:logger.py:88] 2018-03-02 08:44:25 CRITICAL Traceback (most recent call last):
  File "/usr/bin/runtask", line 11, in <module>
    load_entry_point('libtaskotron==0.5.0', 'console_scripts', 'runtask')()
  File "/usr/lib/python2.7/site-packages/libtaskotron/main.py", line 200, in main
    finished = executor.execute()
  File "/usr/lib/python2.7/site-packages/libtaskotron/executor.py", line 317, in execute
    ipaddr = self._spawn_vm(self.arg_data['uuid'])
  File "/usr/lib/python2.7/site-packages/libtaskotron/executor.py", line 58, in _spawn_vm
    self.task_vm.prepare(**env)
  File "/usr/lib/python2.7/site-packages/libtaskotron/ext/disposable/vm.py", line 141, in prepare
    self._prepare_instance(tc_image)
  File "/usr/lib/python2.7/site-packages/libtaskotron/ext/disposable/vm.py", line 106, in _prepare_instance
    tc_instance.start()
  File "/usr/lib/python2.7/site-packages/testcloud/instance.py", line 417, in start
    create_status = dom.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1062, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: File exists

http://taskotron-dev.fedoraproject.org/taskmaster/builders/x86_64/builds/878181/steps/runtask/logs/stdio

@lbrabec believes that the error comes from libvirt calling systemd over dbus:
https://github.com/libvirt/libvirt/blob/master/src/util/virsystemd.c#L325

As it turns out, the machines received a systemd update yesterday:

[root@qa11 ~][PROD]# dnf history info 68
Transaction ID : 68
Begin time     : Thu 01 Mar 2018 08:47:33 PM UTC
Begin rpmdb    : 897:165ec23fbf00bf2c54183c3a81d74498f1629406
End time       : Thu 01 Mar 2018 08:47:46 PM UTC (13 seconds)
End rpmdb      : 897:61d3abdb617c21a6039fd0dbacd05639ca811a74
User           : System <unset>
Return-Code    : Success
Transaction performed with:
    Installed     dnf-2.7.5-2.fc27.noarch  @updates
    Installed     rpm-4.14.0-2.fc27.x86_64 @anaconda
Packages Altered:
    Upgraded python2-crypto-2.6.1-19.fc27.x86_64             @fedora-27-updates
    Upgrade                 2.6.1-22.fc27.x86_64             @updates
    Upgraded systemd-234-9.fc27.x86_64                       @fedora-27-updates
    Upgrade          234-10.git5f8984e.fc27.x86_64           @updates
    Upgraded systemd-container-234-9.fc27.x86_64             @updates
    Upgrade                    234-10.git5f8984e.fc27.x86_64 @updates
    Upgraded systemd-libs-234-9.fc27.x86_64                  @fedora-27-updates
    Upgrade               234-10.git5f8984e.fc27.x86_64      @updates
    Upgraded systemd-pam-234-9.fc27.x86_64                   @fedora-27-updates
    Upgrade              234-10.git5f8984e.fc27.x86_64       @updates
    Upgraded systemd-udev-234-9.fc27.x86_64                  @fedora-27-updates
    Upgrade               234-10.git5f8984e.fc27.x86_64      @updates
    Upgraded unbound-libs-1.6.8-1.fc27.x86_64                @updates
    Upgrade               1.6.8-6.fc27.x86_64                @updates
Scriptlet output:
   1 Running as unit: run-r87f4b8db0c024c2ea740fb5ac221a8b8.service

We rebooted taskotron-dev (qa11) to see whether that would fix the problem. If it doesn't, we can try to downgrade systemd, and if that helps, we'll need to file a bug against it. Also, it would be helpful to figure out whether we can somehow enable more debug logs from the systemd dbus service.


The errors haven't occurred on dev since reboot, so we rebooted stg as well and waiting to see what happens.

Production rebooted, waiting for errors now.

This seems fixed now, reboots helped.

Metadata Update from @kparal:
- Issue close_status updated to: Fixed

6 years ago

Login to comment on this ticket.

Metadata