e8dad1b NFS over GFS issue (fatal: assertion "!bd->bd_pinned && !buffer_busy(bh)" failed)

Authored and Committed by rpeterso 15 years ago
    NFS over GFS issue (fatal: assertion "!bd->bd_pinned && !buffer_busy(bh)" failed)
    
    bz 455696
    
    There were several places in the code that tried to remove buffers
    from the two active items lists (transaction and glock ail) but each
    one was using a different set of criteria, none of which were
    consistent.  This patch introduces a new function,
    gfs_bd_ail_tryremove, that checks the condition of the buffer using
    uniform criteria before removing it from the ail.  All the places
    that were doing this haphazardly now call the new function.
    
    So really, not counting the scsi device driver bug, there were five
    main gfs bugs found with this one scenario:
    
    1. There was a timing window whereby a process could mark a buffer
       dirty while it was being synced to disk.  This was fixed by
       introducing a new semaphore, sd_log_flush_lock, which keeps
       that from happening.
    2. Buffers were being taken off the ail list at different times
       using different criteria.  That was fixed by the new function
       as mentioned above.
    3. Some buffers were not being added to the transaction, especially
       in cases where the files were journaled, which happens mostly
       with directory hash table data.  That was fixed by the
       introduction of the necessary calls to gfs_trans_add_bh.
    4. The transaction glock was prematurely being released when the
       glocks hit a capacity watermark.  That's why it often took so
       long to recreate some of these problems.  To prevent this, new
       code was added to function clear_glock so that it would only
       release the transaction glock at unmount time.  I'm using the
       number of glockd daemons to determine whether the call was made
       during normal operations or at unmount time.  So in order to
       accomodate that change, I had to fix a bit of code where the
       number of glockd daemons was going negative.
    5. When finding its place in the journal, function gfs_find_jhead
       was not holding the journal log lock.  That causing another
       nasty timing window where the journal was being changed while
       the proper journal location was being located.
    
        
file modified
+11 -129
file modified
+26 -1
file modified
+12 -3
file modified
+2 -0
file modified
+5 -1
file modified
+4 -8
file modified
+2 -0
file modified
+80 -1
file modified
+1 -1
file modified
+1 -1
file modified
+3 -6