An error is caused by 2 separated threads try to update RUV at the same time. One is from replica_replace_ruv_tombstone and the other is from dna in the attached stacktraces. The issue is replica_replace_ruv_tombstone modifies RUV with these operation flags OP_FLAG_REPLICATED | OP_FLAG_REPL_FIXUP | OP_FLAG_REPL_RUV, which makes the backend lock skip. If we don't allow RUV update to ignore the backend lock, my stress test keeps going without the "Retry count exceeded" errors.
stacktraces showing RUV is accessed by 2 threads at the same time stacktraces.txt
Bug Description: Current ldbm_back_modify allows RUV to update without respecting other threads in the backend's critical area. It gives a chance for 2 threads trying to modify RUV at the same time in the 2 different transactions which causes the DB deadlocks.
Fix Description: This patch changes the policy for RUV to skip the backend serial lock.
revised git patch file (389-ds-base-1.2.11 branch) 0001-Ticket-47412-Modify-RUV-should-be-serialized-in-ldbm.patch
Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem?
Replying to [comment:3 rmeggins]:
Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we?
Replying to [comment:4 nhosoi]:
Replying to [comment:3 rmeggins]: Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem? Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we?
add - possibly - I'm not sure under what conditions the RUV can be added - delete/modrdn - probably not
Replying to [comment:6 rmeggins]:
Replying to [comment:4 nhosoi]: Replying to [comment:3 rmeggins]: Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem? Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we? add - possibly - I'm not sure under what conditions the RUV can be added - delete/modrdn - probably not
All right. I'm testing with "add RUV" enabling the backend lock...
git patch file (master) -- take 2: add the same change to ldbm_back_add 0001-Ticket-47412-Modify-RUV-should-be-serialized-in-ldbm.2.patch
Thanks to Rich for his comments. I've attached the second patch reflecting his comments.
Reviewed by Rich (Thank you!!)
Pushed to 389-ds-base-1.2.11: commit bc62f82
Metadata Update from @nhosoi: - Issue set to the milestone: 1.2.11.22
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/750
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.