c7ac251 qemu_migration: Avoid crashing if domain dies too quickly

Authored and Committed by mprivozn 10 years ago
    qemu_migration: Avoid crashing if domain dies too quickly
    
    I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
    died too quickly = in Prepare phase. What is happening here is:
    
    1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
    qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
    and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
    succeeds, however switching monitor to QMP mode fails as qemu died
    meanwhile. That is qemuMonitorSetCapabilities() returns -1.
    
    2013-10-08 15:54:10.629+0000: 3493: debug : qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
    2013-10-08 15:54:10.630+0000: 3493: debug : qemuMonitorJSONCommandWithFd:262 : Send command '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
    2013-10-08 15:54:10.630+0000: 3493: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17 events=13
    ...
    2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 : QEMU_MONITOR_SEND_MSG: mon=0x14a53da0 msg={"execute":"qmp_capabilities","id":"libvirt-1"}
     fd=-1
    2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 : Poll got 1 event(s)
    
    2) [Thread 3262] The event loop is trying to do the talking to monitor.
    However, qemu is dead already, remember?
    
    2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 : Unable to read from monitor: Connection reset by peer
    2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
    ...
    2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send command resulted in error internal error: early end of file from monitor: possible problem:
    
    3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
    'endjob' label and subsequently to the 'cleanup'. Since the domain is
    not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
    This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
    Unpleasant because the event loop which is about to trigger EOF callback
    still holds a pointer to the @vm (not the reference). See the valgrind
    output below.
    
    4) [Thread 3262] So the event loop starts triggering EOF:
    
    2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 : Triggering EOF callback
    2013-10-08 15:54:13.543+0000: 3262: debug : qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
    
    And the monitor is cleaned up. This results in calling
    qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer is
    kept in qemuMonitor struct.
    
    ==3262== Thread 1:
    ==3262== Invalid read of size 4
    ==3262==    at 0x77ECCAA: pthread_mutex_lock (in /lib64/libpthread-2.15.so)
    ==3262==    by 0x52FAA06: virMutexLock (virthreadpthread.c:85)
    ==3262==    by 0x52E3891: virObjectLock (virobject.c:320)
    ==3262==    by 0x11626743: qemuProcessHandleMonitorEOF (qemu_process.c:296)
    ==3262==    by 0x11642593: qemuMonitorIO (qemu_monitor.c:730)
    ==3262==    by 0x52BD526: virEventPollDispatchHandles (vireventpoll.c:501)
    ==3262==    by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648)
    ==3262==    by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274)
    ==3262==    by 0x542D3D9: virNetServerRun (virnetserver.c:1112)
    ==3262==    by 0x11F368: main (libvirtd.c:1513)
    ==3262==  Address 0x14549128 is 24 bytes inside a block of size 136 free'd
    ==3262==    at 0x4C2AF5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==3262==    by 0x529B1FF: virFree (viralloc.c:580)
    ==3262==    by 0x52E3703: virObjectUnref (virobject.c:270)
    ==3262==    by 0x531557E: virDomainObjListRemove (domain_conf.c:2355)
    ==3262==    by 0x1160E899: qemuDomainRemoveInactive (qemu_domain.c:2061)
    ==3262==    by 0x1163A0C6: qemuMigrationPrepareAny (qemu_migration.c:2450)
    ==3262==    by 0x1163A923: qemuMigrationPrepareDirect (qemu_migration.c:2626)
    ==3262==    by 0x11682D71: qemuDomainMigratePrepare3Params (qemu_driver.c:10309)
    ==3262==    by 0x53B0976: virDomainMigratePrepare3Params (libvirt.c:7266)
    ==3262==    by 0x1502D3: remoteDispatchDomainMigratePrepare3Params (remote.c:4797)
    ==3262==    by 0x12DECA: remoteDispatchDomainMigratePrepare3ParamsHelper (remote_dispatch.h:5741)
    ==3262==    by 0x54322EB: virNetServerProgramDispatchCall (virnetserverprogram.c:435)
    
    The mon->vm is set in qemuMonitorOpenInternal() which is the correct
    place to increase @vm ref counter. The correct place to decrease the ref
    counter is then qemuMonitorDispose().
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    
        
file modified
+10 -4
file modified
+3 -1