#47412 Modify RUV should be serialized in ldbm_back_modify/add
Closed: Fixed None Opened 6 years ago by nhosoi.

An error is caused by 2 separated threads try to update RUV at the same time. One is from replica_replace_ruv_tombstone and the other is from dna in the attached stacktraces. The issue is replica_replace_ruv_tombstone modifies RUV with these operation flags OP_FLAG_REPLICATED | OP_FLAG_REPL_FIXUP | OP_FLAG_REPL_RUV, which makes the backend lock skip. If we don't allow RUV update to ignore the backend lock, my stress test keeps going without the "Retry count exceeded" errors.


stacktraces showing RUV is accessed by 2 threads at the same time
stacktraces.txt

Bug Description: Current ldbm_back_modify allows RUV to update
without respecting other threads in the backend's critical area.
It gives a chance for 2 threads trying to modify RUV at the
same time in the 2 different transactions which causes the DB
deadlocks.

Fix Description: This patch changes the policy for RUV to skip
the backend serial lock.

Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem?

Replying to [comment:3 rmeggins]:

Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem?

Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we?

Replying to [comment:4 nhosoi]:

Replying to [comment:3 rmeggins]:

Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem?

Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we?

add - possibly - I'm not sure under what conditions the RUV can be added - delete/modrdn - probably not

Replying to [comment:6 rmeggins]:

Replying to [comment:4 nhosoi]:

Replying to [comment:3 rmeggins]:

Is it only modify ops that need to lock for the RUV? Is it possible that an add operation could have the same problem?

Yeah, I also thought about it and I was not sure if we have a chance to add/delete/modrdn RUV in the disk in the contentious situation... :) I assumed we don't, but do we?

add - possibly - I'm not sure under what conditions the RUV can be added - delete/modrdn - probably not

All right. I'm testing with "add RUV" enabling the backend lock...

git patch file (master) -- take 2: add the same change to ldbm_back_add
0001-Ticket-47412-Modify-RUV-should-be-serialized-in-ldbm.2.patch

Thanks to Rich for his comments. I've attached the second patch reflecting his comments.

Reviewed by Rich (Thank you!!)

Pushed to 389-ds-base-1.2.11: commit bc62f82

Metadata Update from @nhosoi:
- Issue set to the milestone: 1.2.11.22

2 years ago

Login to comment on this ticket.

Metadata