Ticket #360 - ldapmodify returns Operations error
https://fedorahosted.org/389/ticket/360
Resolves: Ticket #360
Bug Description: ldapmodify returns Operations error
Reviewed by: mreynolds (Thanks!)
Branch: master
Fix Description:
1) Fix handling of DB_LOCK_DEADLOCK conditions. When a database operation
returns DB_LOCK_DEADLOCK, all cursors must be closed, and the transaction
aborted and retried. If not in a transaction, the operation may be retried
immediately. This fix adds this logic to many places in the db code where
it was lacking.
2) Fix resetting of the data when an operation has to be retried. When
a transaction has to be retried, we must reset the data to the same state
it was before any of the operations in the transaction were attempted. This
includes the entry cache state, which was lacking in a number of ways. One
major problem with resetting the cache is that cache_add_tentative adds an
entry to the dncache, but not the idcache, in order to reserve a space in
the cache, and to prevent other entries with the same DN from being added
while the operation is in progress. There was no good way to remove this
entry. In the case of modrdn, removing the tentative entry would also
remove the real entry since they share the same entryid. This fix also
makes sure that in error conditions, temporary entries are removed from
the cache and properly freed, and real entries are "rolled back" into the
cache.
3) Added a transaction stress thread. This thread can simulate various types
of read and write locking that can cause conflicts and trigger regular
database operations to return DB_LOCK_DEADLOCK. The usual culprit is read
locks which are held on pages that are searched outside of a transaction.
The stress thread can lock, via read cursors, all of the pages in all of
the indexes in the database, and hold these pages for some set period of
time, then loop and do it again.
4) It is quite easy to get the database in a deadlock situation, where a
update operation is waiting for a read lock to be released in order to
write to a page, while a search operation is waiting for a write lock to
be released in order to read lock the page. If we are going to allow
concurrent searches during update operations, without making search requests
transacted, we need to have some way to detect these deadlocks. The fastest
way to do it is to use the DB_TXN_NOWAIT flag when starting transactions.
This tells bdb to return immediately with a DB_LOCK_DEADLOCK if the
transaction cannot proceed due to a locked page (e.g. a search request
has read locked a page). Alternately, we could have transactions wait
for some specified period of time, but if we think that this type of thread
contention is going to be rare, it makes sense to return immediately, if
our deadlock handling is fast and robust.
5) Fixed some memory leaks
6) The txn_test thread needs to know when the backend databases are
available - had to add a backend flag BE_STATE_STOPPING so that
the txn_thread knows when the databases are available
7) If the op was retried RETRY_TIMES times, return LDAP_BUSY instead of
OPERATIONS_ERROR - the problem really is that the server is busy with
other transactions, and the operation could go through if the client
were to retry.
8) Renaming an entry without changing the dn e.g. changing the case does
not cache the entry, so handle that
9) Added a delay when a deadlock is encountered in modrdn - same as the
other add/mod/del cases
Platforms tested: RHEL6 x86_64, Fedora 17
Flag Day: yes
Doc impact: yes