btrfs-progs: Exhaust delayed refs and dirty block groups to prevent delayed refs lost
[BUG]
Btrfs-progs sometimes fails to find certain extent backref when
committing transaction.
The most reliable way to reproduce it is fsck-test/013 on 64K page sized
system:
[...]
adding new data backref on 315859712 root 287 owner 292 offset 0 found 1
btrfs unable to find ref byte nr 31850496 parent 0 root 2 owner 0 offset 0
Failed to find [30867456, 168, 65536]
Also there are some github bug reports related to this problem.
[CAUSE]
Commit 909357e86799 ("btrfs-progs: Wire up delayed refs") introduced
delayed refs in btrfs-progs.
However in that commit, delayed refs are not run at correct timing.
That commit calls btrfs_run_delayed_refs() before
btrfs_write_dirty_block_groups(), which needs to update
BLOCK_GROUP_ITEMs in extent tree, thus could cause new delayed refs.
This means each time we commit a transaction, we may screw up the extent
tree by dropping some pending delayed refs, like:
Transaction 711:
btrfs_commit_transaction()
|- btrfs_run_delayed_refs()
| Now all delayed refs are written to extent tree
|
|- btrfs_write_dirty_block_groups()
| Needs to update extent tree root
| ADD_DELAYED_REF to 315859712.
| Delayed refs are attached to current trans handle.
|
|- __commit_transaction()
|- write_ctree_super()
|- btrfs_finish_extent_commit()
|- kfree(trans)
Now delayed ref for 315859712 are lost
Transaction 712:
Tree block 315859712 get dropped
btrfs_commit_transaction()
|- btrfs_run_delayed_refs()
|- run_one_delayed_ref()
|- __free_extent()
As previous ADD_DELAYED_REF to 315859712 is lost, extent tree
doesn't have any backref for 315859712, causing the bug
In fact, commit c31edf610cbe ("btrfs-progs: Fix false ENOSPC alert by
tracking used space correctly") detects the tree block leakage, but in
the reproducer we have too much noise, thus nobody notices the leakage
warning.
[FIX]
We can't just move btrfs_run_delayed_refs() after
btrfs_write_dirty_block_groups(), as during btrfs_run_delayed_refs(), we
can re-dirty block groups.
Thus we need to exhaust both delayed refs and dirty blocks.
This patch will call btrfs_write_dirty_block_groups() and
btrfs_run_delayed_refs() in a loop until both delayed refs and dirty
blocks are exhausted. Much like what we do in commit_cowonly_roots() in
kernel.
Also, to prevent such problem from happening again (and not to debug
such problem again), add extra check on delayed refs before freeing the
transaction handle.
Reported-by: Klemens Schölhorn <klemens@schoelhorn.eu>
Issue: #187
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>