gfs_controld: fix plock recovery
When there are two nodes in the cluster, and the
the node in charge of the plock checkpoint fails,
the remaining node does not unlink the checkpoint
that had been created by the failed node. When
the failed node returns, and the new node attempts
to transfer plock state, it fails to create a new
checkpoint because it did not unlink the previous
checkpoint created by the failed node. This leads
to any existing plock state not being transferred
to the newly joined node. The newly joined node
will then mistakenly grant plocks to itself that
may conflict with plocks that the other node could
not transfer. This leads to:
1. conflicting plocks being held concurrently
2. dangling plocks that are not held but not removed
In the explanation above, the reason the remaining
node does not unlink the checkpoint that had been
created by the other node, is that it does not know
that the other node was in charge of the checkpoint.
It could only know this if it had been present before
and after the previous membership change. Because
there are only two nodes, this was not possible.
This, however, is also the point exploited to fix
the problem. When there are only two members, a new
node can assume that the other node is in charge of
the checkpoint.
The following test shows the problem/fix using
a program "doplock" that requests an exclusive,
blocking posix lock on the given file.
node1: mount /gfs
node2: mount /gfs
node1: touch /gfs/test
node1: doplock /gfs/test (granted)
node2: doplock /gfs/test (blocks)
node1: killed
node2: recovery for node1
node2: doplock above granted the lock
node1: restarts
node1: mount /gfs
node1: doplock /gfs/test
In the last step, the node1 doplock should block
because node2 holds the lock. Before the fix,
it was granted.
Signed-off-by: David Teigland <teigland@redhat.com>