#48179 When starting a replica agreement a deadlock can occur with an op updating nsuniqueid index
Closed: Fixed None Opened 4 years ago by tbordaz.

The version was 389-ds-base-1.3.3.9-1 (F21).

A write operation (like a DEL) can update nsuniqueid index. in betxn_postop when it tries to update the changelog/ruv, it tries to update the replica agreements and so acquire the RA locks.

If at the same time, the replica agreement is started, it triggers an internal search to retrieve the current ruv. It does internal search using nsuniqueid and so while it is holding the RA lock it accesses the nsuniqueid index.

Could be related to fix:
Ticket 47368 - IPA server dirsrv RUV entry data excluded from replication

How to reproduce:
I reproduced it several times on VM F21 with ticket47787_test.py


Here is a first analyze of the deadlock: {{{ Thread 23 (Thread 0x7f15c6ffd700 (LWP 7007)): #0 0x00007f15ea129590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f15e3f681fb in __db_pthread_mutex_condwait (env=0x7f15ed280d00, mutex=0, timespec=0x0, mutexp=<optimized out="">) at ../../src/mutex/mut_pthread.c:321 #2 __db_hybrid_mutex_suspend (env=env@entry=0x7f15ed280d00, mutex=mutex@entry=1108, timespec=timespec@entry=0x0, exclusive=exclusive@entry=1) at ../../src/mutex/mut_pthread.c:577 #3 0x00007f15e3f6759b in __db_tas_mutex_lock_int (nowait=0, timeout=0, mutex=<optimized out="">, env=0x7f15ed280d00) at ../../src/mutex/mut_tas.c:255 #4 __db_tas_mutex_lock (env=env@entry=0x7f15ed280d00, mutex=1108, timeout=timeout@entry=0) at ../../src/mutex/mut_tas.c:286 #5 0x00007f15e4011a91 in __lock_get_internal (lt=0x7f15ed281580, sh_locker=<optimized out="">, flags=<optimized out="">, obj=<optimized out="">, lock_mode=<optimized out="">, timeout=0, lock=0x7f15c6fe9070) at ../../src/lock/lock.c:989 #6 0x00007f15e4012657 in __lock_get (env=0x7f15deb033e4, env@entry=0x7f15ed280d00, locker=0x0, flags=1, obj=0xffffffffffffffff, obj@entry=0x7f15b8007150, lock_mode=3736089344, lock=0x0, lock@entry=0x7f15c6fe9070) at ../../src/lock/lock.c:469 #7 0x00007f15e403e6af in __db_lget (dbc=dbc@entry=0x7f15b8007060, action=action@entry=0, pgno=1, mode=<optimized out="">, mode@entry=DB_LOCK_READ, lkflags=lkflags@entry=0, lockp=lockp@entry=0x7f15c6fe9070) at ../../src/db/db_meta.c:1257 #8 0x00007f15e3f845d9 in __bam_get_root (dbc=dbc@entry=0x7f15b8007060, root_pgno=root_pgno@entry=0, slevel=slevel@entry=1, flags=flags@entry=1409, stack=stack@entry=0x7f15c6fe91a4) at ../../src/btree/bt_search.c:202 #9 0x00007f15e3f8491b in __bam_search (dbc=0x7f15b8007060, root_pgno=<optimized out="">, key=0x7f15c6fe94b0, flags=1409, slevel=1, recnop=0x0, exactp=0x7f15c6fe92f4) at ../../src/btree/bt_search.c:309 #10 0x00007f15e3f70144 in __bamc_search (dbc=0x7f15b8007060, root_pgno=root_pgno@entry=0, key=0x1, flags=26, exactp=0x7f15c6fe92f4) at ../../src/btree/bt_cursor.c:2804 #11 0x00007f15e3f71dff in __bamc_get (dbc=0x7f15b8007060, key=<optimized out="">, data=<optimized out="">, flags=<optimized out="">, pgnop=0x7f15c6fe9394) at ../../src/btree/bt_cursor.c:1099 #12 0x00007f15e402aba3 in __dbc_iget (dbc=0x7f15b800afd0, key=0x7f15c6fe94b0, data=0x7f15c6fe94e0, flags=26) at ../../src/db/db_cam.c:952 #13 0x00007f15e402b56d in __dbc_get (dbc=dbc@entry=0x7f15b800afd0, key=key@entry=0x7f15c6fe94b0, data=data@entry=0x7f15c6fe94e0, flags=flags@entry=2074) at ../../src/db/db_cam.c:770 #14 0x00007f15e403a092 in __dbc_get_pp (dbc=0x7f15b800afd0, key=0x7f15c6fe94b0, data=0x7f15c6fe94e0, flags=2074) at ../../src/db/db_iface.c:2361 #15 0x00007f15e0da88de in idl_new_fetch () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #16 0x00007f15e0db6e4b in index_read_ext_allids () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #17 0x00007f15e0db72f2 in index_read_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #18 0x00007f15e0db730b in index_read () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #19 0x00007f15e0deedcf in uniqueid2entry () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #20 0x00007f15e0da3bc4 in find_entry_internal.isra () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #21 0x00007f15e0da4099 in find_entry () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #22 0x00007f15e0dddebb in ldbm_back_search () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #23 0x00007f15ec391b60 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0 #24 0x00007f15ec3a1c2e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0 #25 0x00007f15ec3a1eb8 in search_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0 #26 0x00007f15dfe98048 in agmt_set_maxcsn () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #27 0x00007f15dfe98425 in agmt_start () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #28 0x00007f15dfe985df in agmt_set_enabled_from_entry () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #29 0x00007f15dfe99828 in agmtlist_modify_callback () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #30 0x00007f15ec357feb in dse_call_callback.isra () from /usr/lib64/dirsrv/libslapd.so.0 #31 0x00007f15ec359ff9 in dse_modify () from /usr/lib64/dirsrv/libslapd.so.0 #32 0x00007f15ec38b035 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0 #33 0x00007f15ec38c337 in do_modify () from /usr/lib64/dirsrv/libslapd.so.0 #34 0x00007f15ec86f985 in connection_threadmain () #35 0x00007f15ea783cab in _pt_root (arg=0x7f15ed3e4020) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #36 0x00007f15ea12452a in start_thread () from /lib64/libpthread.so.0 #37 0x00007f15e9e6079d in clone () from /lib64/libc.so.6 24 dd=21 locks held 0 write locks 0 pid/thread 6986/7F15C6FFD700 flags 0 priority 100 24 dd=21 locks held 0 write locks 0 pid/thread 6986/139731509696256 flags 0 priority 100 24 READ 1 WAIT userRoot/nsuniqueid.db page 1 Thread 30 (Thread 0x7f15ceffd700 (LWP 7000)): #0 0x00007f15ea12bf1d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f15ea1269be in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f15ea77e0b9 in PR_Lock (lock=0x7f15b00084e0) at ../../../nspr/pr/src/pthreads/ptsynch.c:177 #3 0x00007f15dfe94555 in agmt_get_replarea () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #4 0x00007f15dfe99f99 in agmtlist_get_next_agreement_for_replica () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #5 0x00007f15dfe97ad7 in agmt_update_maxcsn () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #6 0x00007f15dfea358e in write_changelog_and_ruv () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #7 0x00007f15dfea48de in multimaster_be_betxnpostop_delete () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #8 0x00007f15ec39cabf in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f15ec39cde3 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0 #10 0x00007f15e0dc9e26 in ldbm_back_delete () from /usr/lib64/dirsrv/plugins/libback-ldbm.so #11 0x00007f15ec351960 in op_shared_delete () from /usr/lib64/dirsrv/libslapd.so.0 #12 0x00007f15ec351c23 in do_delete () from /usr/lib64/dirsrv/libslapd.so.0 #13 0x00007f15ec86fa0e in connection_threadmain () #14 0x00007f15ea783cab in _pt_root (arg=0x7f15ed3f5ec0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #15 0x00007f15ea12452a in start_thread () from /lib64/libpthread.so.0 #16 0x00007f15e9e6079d in clone () from /lib64/libc.so.6 800000a8 dd= 1 locks held 20 write locks 13 pid/thread 6986/7F15CEFFD700 flags 0 priority 100 800000a8 dd= 1 locks held 20 write locks 13 pid/thread 6986/139731643913984 flags 0 priority 100 800000a8 WRITE 1 HELD /var/lib/dirsrv/slapd-master_2/changelogdb/f502a608-efe811e4-908cc50a-b5be7e58_55434ed8000000010000.db page 3 800000a8 WRITE 1 HELD userRoot/numsubordinates.db page 1 800000a8 WRITE 1 HELD userRoot/id2entry.db page 2 800000a8 WRITE 1 HELD userRoot/nscpEntryDN.db page 1 800000a8 WRITE 1 HELD userRoot/nsTombstoneCSN.db page 1 800000a8 WRITE 9 HELD userRoot/entryrdn.db page 1 800000a8 WRITE 4 HELD userRoot/ancestorid.db page 1 800000a8 READ 2 HELD userRoot/ancestorid.db page 1 800000a8 READ 10 HELD userRoot/entryrdn.db page 1 800000a8 WRITE 3 HELD userRoot/parentid.db page 1 800000a8 READ 1 HELD userRoot/parentid.db page 1 800000a8 WRITE 3 HELD userRoot/nsuniqueid.db page 1 800000a8 WRITE 28 HELD userRoot/cn.db page 1 800000a8 READ 14 HELD userRoot/cn.db page 1 800000a8 WRITE 28 HELD userRoot/sn.db page 1 800000a8 READ 14 HELD userRoot/sn.db page 1 800000a8 WRITE 5 HELD userRoot/objectclass.db page 1 800000a8 READ 2 HELD userRoot/objectclass.db page 1 800000a8 WRITE 1 HELD userRoot/id2entry.db page 3 800000a8 READ 5 HELD userRoot/nsuniqueid.db page 1 }}}

I also ran into this deadlock while investigating ticket 47788.

Looks good!

An extremely minor issue. We don't need this cast any more! ;)

705         (Slapi_DN *)repl_sdn,

I rmeoved that cast. Thanks for the review Noriko!

f5d2445..eb3086d master -> master
commit eb3086d
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Jul 17 15:08:00 2015 -0400

8600a5e..23a3ff6 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 23a3ff6

Metadata Update from @tbordaz:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.4.2

2 years ago

Login to comment on this ticket.

Metadata