NFS over GFS issue (fatal: assertion "!bd->bd_pinned && !buffer_busy(bh)" failed)
bz 455696
There were several places in the code that tried to remove buffers
from the two active items lists (transaction and glock ail) but each
one was using a different set of criteria, none of which were
consistent. This patch introduces a new function,
gfs_bd_ail_tryremove, that checks the condition of the buffer using
uniform criteria before removing it from the ail. All the places
that were doing this haphazardly now call the new function.
So really, not counting the scsi device driver bug, there were five
main gfs bugs found with this one scenario:
1. There was a timing window whereby a process could mark a buffer
dirty while it was being synced to disk. This was fixed by
introducing a new semaphore, sd_log_flush_lock, which keeps
that from happening.
2. Buffers were being taken off the ail list at different times
using different criteria. That was fixed by the new function
as mentioned above.
3. Some buffers were not being added to the transaction, especially
in cases where the files were journaled, which happens mostly
with directory hash table data. That was fixed by the
introduction of the necessary calls to gfs_trans_add_bh.
4. The transaction glock was prematurely being released when the
glocks hit a capacity watermark. That's why it often took so
long to recreate some of these problems. To prevent this, new
code was added to function clear_glock so that it would only
release the transaction glock at unmount time. I'm using the
number of glockd daemons to determine whether the call was made
during normal operations or at unmount time. So in order to
accomodate that change, I had to fix a bit of code where the
number of glockd daemons was going negative.
5. When finding its place in the journal, function gfs_find_jhead
was not holding the journal log lock. That causing another
nasty timing window where the journal was being changed while
the proper journal location was being located.