18411e7 dlm-kernel: can_be_granted workaround for user lockspaces

Authored and Committed by teigland 14 years ago
    dlm-kernel: can_be_granted workaround for user lockspaces
    
    bz 483685
    
    There is a problem in the _can_be_granted() logic regarding the
    waitqueue.  Normally, a lock request cannot be granted if there's
    another lock already on the waitqueue.  This gives FIFO fairness.
    One of the big problems in the RHEL4 dlm is that the waitqueue is
    misused, and locks are kept there while waiting for on a remote
    lock request.  This can mess up the normal lock granting logic.
    
    Example:
    
    Requests for lkb1 and lkb2 are both sent to another node we think
    is master.  Both are kept on the rsb waitqueue while waiting for a
    reply.  The remote node ends up not being the master and returns
    EINVAL for lkb1.
    
    Local processing of lkb1 after getting the EINVAL reply:
    - lkb1 is removed from the waitqueue
    - we look up the master again (resdir is local for this rsb)
    - the master ends up being us
    - rsb nodeid set to 0
    - lkb1 is passed into dlm_lock_stage2 and dlm_lock_stage3
    - lkb1 is passed into _can_be_granted
    - _can_be_granted sees lkb2 on the waitqueue so says lkb1 cannot
      be granted
    - so lkb1 is added to the waitqueue (in the proper sense,
      i.e. not because it's waiting for a remote reply but because
      of the master granting logic)
    
    Local processing of lkb2 after getting its EINVAL reply:
    - lkb2 removed from the waitqueue
    - we see we are now the master, rsb nodid is 0
    - lkb2 is passed into dlm_lock_stage2 and dlm_lock_stage3
    - lkb2 is passed into _can_be_granted
    - _can_be_granted sees lkb1 on the waitqueue so says lkb2 cannot be
      granted
    - so lkb2 is added to the waitqueue (again in the proper sense)
    
    Other lock requests then arrive for this rsb, and all continue to
    be added to the waitqueue because it's not empty.
    
    This patch makes _can_be_granted return TRUE for locks, like lkb1,
    that are being requested/tested by the grant logic for the first
    time since being requested, i.e. they are not already on the
    waitqueue.
    
    To avoid regressions in this particularly sensitive area of code,
    the fix is only enabled for user lockspaces (e.g. clvmd and
    rgmanager), and the fix can be disabled by
    
    echo 0 > /proc/cluster/config/dlm/user_grant_now
    
    in case it causes a regression in some user lockspace workload.
    
    Signed-off-by: David Teigland <teigland@redhat.com>
    
        
file modified
+7 -1
file modified
+1 -0
file modified
+11 -0
file modified
+9 -0