76741bb qdiskd: Make multipath issues go away

6 files Authored by lon 11 years ago, Committed by fabbione 11 years ago,
    qdiskd: Make multipath issues go away
    
    Qdiskd hsitorically has required significant tuning to work around
    delays which occur during multipath failover, overloaded I/O, and LUN
    trespasses in both device-mapper-multipath and EMC PowerPath
    environments.
    
    This patch goes a very long way towards eliminating false evictions
    when these conditions occur by making qdiskd whine to the other
    cluster members when it detects hung system calls.  When a cluster
    member whines, it indicates the source of the problem (which system
    call is hung), and the act of receiving a whine from a host indicates
    that qdiskd is operational, but that I/O is hung.  Hung I/O is different
    from losing storage entirely (where you get I/O errors).
    
    Possible problems:
    
    - Receive queue getting very full, causing messages to become blocked on
    a node where I/O is hung.  1) that would take a very long time, and 2)
    node should get evicted at that point anyway.
    
    Resolves: rhbz#782900
    
    this version of the patch is a backport of:
    e2937eb33f224f86904fead08499a6178868ca6a
    34d2872fb7e60be1594158acaaeb8acd74f78d22
    
    There is a minor change vs original patch based on how qdiskd
    in RHEL5 handles cman connection. We add an extra call to cman_alive
    in main qdisk_loop to make sure data are not stalled on the
    cman port, and data_callback to qdiskd_whine executed.
    
    Signed-off-by: Lon Hohberger <lhh@redhat.com>
    Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
    
        
file modified
+1 -0
file modified
+1 -1
file modified
+6 -0
file modified
+14 -3
file modified
+3 -1
file modified
+49 -5