afb6cf2 gfs_controld: retry recovery for withdrawn journal

Authored and Committed by teigland 16 years ago
    gfs_controld: retry recovery for withdrawn journal
    
    bz 442451
    
    This is unfortunate, but seems to be the best solution available.  The
    problem, described more fully in the bz, is that when gfs_controld tries
    to do recovery on a journal for a withdraw, the withdrawing node may not
    yet have cleared its dlm locks.  This means the journal lock may still be
    held by the withdrawing node, causing all the recovering node(s) to fail
    acquiring it, and no one does the recovery.  The solution is for all
    recovering nodes to retry recovery of a withdrawn journal until they
    succeed (only the first to get the journal lock will actually recover
    it, the others will see it's recovered and report success.)
    
    Signed-off-by: David Teigland <teigland@redhat.com>
    
        
file modified
+19 -0