On Fri, Mar 25, 2005 at 03:22:38PM -0800, Daniel McNeil wrote:
Looking at the code, the problem is a race condition between
dlm_astd() and release_lockspace(). dlm_astd can pull an
lkb off the ast_queue and still be processing it while the
release_lockspace() is running calls dlm_dir_clear() and
then kfree()s ls->ls_dirtbl. When dlm_astd() calls
release_rsb() it leads to a dlm_dir_remove() which accesses
the freed ls_dirtbl which is freed. With slab debug, this
leads a spinning write_lock() and a hung umount. My machines
are 2 cpu systems which also might expose the race condition.
The fix is below and is fairly simple, just do the astd_suspend()
in release_lockspace() before the dlm_dir_clear() and kfree().
That way astd won't be process lkb on the astqueue will it
is being freed.