152251b monitor: Service restart fixes

1 file Authored by sgallagh 8 years ago, Committed by jhrozek 8 years ago,
    monitor: Service restart fixes
    
    There are actually two bugs here:
    
    1) When either the kill(SIGTERM) or kill(SIGKILL) commands returned
    failure (for any reason), we would talloc_free(svc) which removed it
    from being eligible for restart, resulting in the service never
    starting again without an SSSD service restart.
    
    2) There is a fairly wide race condition where it's possible for a
    SIGKILL timer to "catch up" to the child exit handler between us
    noticing the termination and actually restarting it. The race
    happens because we re-enter the mainloop and add a restart
    timeout to avoid a quick failure if we keep restarting due to a
    transitory issue (the mt_svc object, and therefore the SIGKILL
    timer, were never freed until we got to the actual service
    restart).
    
    We can minimize this race by recording  the timer_event for the
    SIGKILL timeout in the mt_svc object. This way, if the process
    exits via SIGTERM, we will immediately remove the timer for the
    SIGKILL. Additionally, we'll catch the special-case of an ESRCH
    response from the kill(SIGKILL) and assume that it means that the
    process has exited. The only other two possible errors are
     * EINVAL: (an invalid signal was specified) - This should be
               impossible, obviously.
     * EPERM: This process doesn't have permission to send signals to
              this PID. If this happens, it's either an SELinux bug or
              else the process has terminated and a new process that
              SSSD doesn't control has taken the ID over.
    
    So in the incredibly unlikely case that one of those occurs, we'll
    just go ahead and try to start a new process.
    
    This patch also removes the incorrect talloc_free(svc) calls on the
    kill() failures and replaces them with an attempt to just start up
    the service again and hope for the best.
    
    Resolves:
    https://fedorahosted.org/sssd/ticket/2525
    
    Reviewed-by: Pavel Březina <pbrezina@redhat.com>
    
        
file modified
+74 -20