Ticket #47410 - changelog db deadlocks with DNA and replication
https://fedorahosted.org/389/ticket/47410
Reviewed by: mreynolds (Thanks!)
Branch: master
Fix Description: The deadlock is caused by having an outer and an inner
transaction in one thread, and a replication reader in another thread. The
outer transaction acquires a write lock on certain changelog db (cldb) pages
as a result of a previous nested transaction (e.g. a DNA shared config
area update). The changelog reader in the cursor positioning operation
acquires read locks on certain other pages. When another inner write
transaction occurs, it may attempt to acquire a write lock on a page held
by a read lock in the reader thread. This will eventually fail because
the reader will not release its lock on the page until the outer transaction
releases the write lock on the page.
The solution is to change the way the deadlock detection thread works, to
use a different deadlock rejection policy. When using DB_LOCK_MINWRITE
instead of the default DB_LOCK_YOUNGEST, the reader thread lock request is
rejected. This means the code that positions the changelog cursor has to be
able to handle a DB_LOCK_DEADLOCK return.
Changing the deadlock rejection policy globally to DB_LOCK_MINWRITE has the
potential to cause any search to get a DB_LOCK_DEADLOCK from a db or cursor
get(), so this will need to be tested a great deal to make sure we can handle
all such cases.
Platforms tested: RHEL6 x86_64
Flag Day: no
Doc impact: no