Bug 291521: Cluster mirror can become out-of-sync if nominal I/O overl...
Another touch-up for this bug.
Bad news:
Because a node can cache the state of a region indefinitely (especially for
blocks that are used alot - e.g. a journaling area of a file system), we must
deny writes to any region of the mirror that is not yet recovered. This is only
the case with cluster mirroring. This means poor performance of nominal I/O
during recovery - probably really bad performance. However, this is absolutely
necessary for mirror reliability.
Good news:
The time I spent coding different fixes for this bug weren't a complete waste.
I've been able to reuse some of that code to optimize the recovery process.
Now, rather than going through the mirror from front to back, it skips ahead to
recover regions that have pending writes. Bottom line: performance will be bad
during recovery, but it will be better than RHEL4.5.