5db9031 Bug 571677 - Busy replica on consumers when directly deleting a replication conflict

Authored and Committed by rmeggins 14 years ago
    Bug 571677 - Busy replica on consumers when directly deleting a replication conflict
    
    https://bugzilla.redhat.com/show_bug.cgi?id=571677
    Resolves: bug 571677
    Bug Description: Busy replica on consumers when directly deleting a replication conflict
    Reviewed by: nhosoi (Thanks!)
    Branch: Directory_Server_8_2_Branch
    Fix Description: In some cases, urp fixup operations can be called from
    the bepreop stage of other operations.  The ldbm_back_delete() and
    ldbm_back_modify() code lock the target entry in the cache.  If a bepreop
    then attempts to operate on the same entry and acquire the lock on the
    entry, deadlock will occur.
    The modrdn code does not acquire the cache lock on the target entries
    before calling the bepreops.  The modify and delete code does not acquire
    the cache lock on the target entries before calling the bepostops.
    I tried unlocking the target entry before calling the bepreops, then locking
    the entry just after.  This causes the problem to disappear, but I do not
    know if this will lead to race conditions.  The modrdn has been working this
    way forever, and there are no known race conditions with that code.
    I think the most robust fix for this issue would be to introduce some sort
    of semaphore instead of a simple mutex on the cached entry.  Then
    cache_lock_entry would look something like this:
            if entry->sem == 0
               entry->sem++ /* acquire entry */
               entry->locking_thread = this_thread
            else if entry->locking_thread == this_thread
               entry->sem++ /* increment count on this entry */
            else
               wait_for_sem(entry->sem) /* wait until released */
    and cache_unlock_entry would look something like this:
            entry->sem--;
            if entry->sem == 0
                entry->locking_thread = 0
    Platforms tested: RHEL5 x86_64
    Flag Day: no
    Doc impact: no
    (cherry picked from commit eac3f15f2209719e05640e1576b4273d03bef079)