#48166 libdb deadlock while performing modrdns
Closed: wontfix a month ago by spichugi. Opened 5 years ago by mreynolds.

While running a modrdn stress test it appears that there is a deadlock when between the modrdn operation and the checkpointing thread

#0  0x00007f92ddb7eca0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f92d687a1c3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007f92d68795a8 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007f92d69946fb in __memp_fget () from /lib64/libdb-5.3.so
#4  0x00007f92d68970b1 in __bam_search () from /lib64/libdb-5.3.so
#5  0x00007f92d6882126 in __bamc_search () from /lib64/libdb-5.3.so
#6  0x00007f92d68865a4 in __bamc_put () from /lib64/libdb-5.3.so
#7  0x00007f92d693db85 in __dbc_iput () from /lib64/libdb-5.3.so
#8  0x00007f92d6938d1e in __db_put () from /lib64/libdb-5.3.so
#9  0x00007f92d694e5d4 in __db_put_pp () from /lib64/libdb-5.3.so
#10 0x00007f92d46de02f in idl_new_insert_key (be=<optimized out>, db=<optimized out>, key=<optimized out>, id=4, txn=<optimized out>,
    a=<optimized out>, disposition=0x0) at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/idl_new.c:830
#11 0x00007f92d46dcec5 in idl_insert_key (be=be@entry=0xd2e9d0, db=db@entry=0x7f92340029f0, key=key@entry=0x7f92b2fefc10, id=id@entry=4,
    txn=txn@entry=0x7f9268004e40, a=a@entry=0xe09f70, disposition=disposition@entry=0x0)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/idl_shim.c:144
#12 0x00007f92d46ecbe8 in addordel_values_sv (be=be@entry=0xd2e9d0, db=0x7f92340029f0, indextype=<optimized out>, vals=<optimized out>,
    id=id@entry=4, flags=flags@entry=1, txn=txn@entry=0x7f92b2ff2550, a=0xe09f70, idl_disposition=idl_disposition@entry=0x0,
    buffer_handle=buffer_handle@entry=0x0, type=<optimized out>)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/index.c:1962
#13 0x00007f92d46ed642 in index_addordel_values_ext_sv (be=be@entry=0xd2e9d0, type=<optimized out>, vals=0x7f9268008a10,
    evals=evals@entry=0x0, id=id@entry=4, flags=flags@entry=1, txn=txn@entry=0x7f92b2ff2550, idl_disposition=idl_disposition@entry=0x0,
    buffer_handle=buffer_handle@entry=0x0) at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/index.c:2156
#14 0x00007f92d46edbe4 in index_addordel_values_sv (be=be@entry=0xd2e9d0, type=<optimized out>, vals=<optimized out>,
    evals=evals@entry=0x0, id=id@entry=4, flags=flags@entry=1, txn=txn@entry=0x7f92b2ff2550)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/index.c:2021
#15 0x00007f92d46ee3ab in index_add_mods (be=0xd2e9d0, mods=<optimized out>, olde=olde@entry=0x7f9294009a50,
    newe=newe@entry=0x7f926802e960, txn=txn@entry=0x7f92b2ff2550)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/index.c:674
#16 0x00007f92d470bde0 in modrdn_rename_entry_update_indexes (ptxn=ptxn@entry=0x7f92b2ff2550, pb=pb@entry=0x7f92b2ff4ae0, e=0x7f9294009a50,
    ec=ec@entry=0x7f92b2ff2578, smods1=smods1@entry=0x7f92b2ff2670, smods2=smods2@entry=0x7f92b2ff2690, smods3=smods3@entry=0x7f92b2ff2650,
    li=<optimized out>) at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:1901
#17 0x00007f92d470c76b in ldbm_back_modrdn (pb=<optimized out>)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:1053
#18 0x00007f92e045d047 in op_shared_rename (pb=pb@entry=0x7f92b2ff4ae0, passin_args=0)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/modrdn.c:652
#19 0x00007f92e045d885 in do_modrdn (pb=pb@entry=0x7f92b2ff4ae0) at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/modrdn.c:256
#20 0x00000000004183b9 in connection_dispatch_operation (pb=0x7f92b2ff4ae0, op=0xf4b530, conn=0x7f92e0800aa0)
    at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/connection.c:655
#21 connection_threadmain () at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/connection.c:2534

The checkpointing thread appears to own the db lock:

This is the modrdn thread

(gdb) thread 13
[Switching to thread 13 (Thread 0x7f4e697f2700 (LWP 4177))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     62:     movl    (%rsp), %edi
(gdb) up
#1  0x00007f4e8e0e1570 in __db_pthread_mutex_condwait (env=0x2616fa0, mutex=1965, mutexp=0x7f4e888a6090, timespec=0x0) at ../src/mutex/mut_pthread.c:321
321                     RET_SET((pthread_cond_wait(&mutexp->u.m.cond,
(gdb) p *mutexp
$1 = {u = {m = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 128, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
        __size = '\000' <repeats 12 times>, "\001\000\000\000\200", '\000' <repeats 22 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 1, __total_seq = 1, __wakeup_seq = 0, __woken_seq = 0,
          __mutex = 0xffffffffffffffff, __nwaiters = 2, __broadcast_seq = 0},
        __size = "\000\000\000\000\001\000\000\000\001", '\000' <repeats 23 times>, "\377\377\377\377\377\377\377\377\002\000\000\000\000\000\000", __align = 4294967296}}, rwlock = {__data = {__lock = 0,
        __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 1, __nr_readers_queued = 128, __nr_writers_queued = 0, __writer = 0, __shared = 0, __pad1 = 0, __pad2 = 4294967296, __flags = 1},
      __size = '\000' <repeats 12 times>, "\001\000\000\000\200", '\000' <repeats 27 times>, "\001\000\000\000\001\000\000\000\000\000\000", __align = 0}}, tas = 0 '\000', sharecount = {value = 1}, wait = 1,
  pid = 4145, tid = 139975259846400, mutex_next_link = 0, alloc_id = 15, mutex_set_wait = 0, mutex_set_nowait = 17, mutex_set_rd_wait = 0, mutex_set_rd_nowait = 48, hybrid_wait = 0, hybrid_wakeup = 0,
  flags = 49]

not sure what all the fields mean, but tid = 139975259846400 == 0x7f4e87a3f700
and this is the checkpoint thread

Thread 41 (Thread 0x7f4e87a3f700 (LWP 4149)):
#0  0x00007f4e9519e5f3 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f4e8e29fd19 in __os_sleep (env=0x2616fa0, secs=1, usecs=0) at ../src/os/os_yield.c:90
#2  0x00007f4e8e29fcb9 in __os_yield (env=0x2616fa0, secs=1, usecs=0) at ../src/os/os_yield.c:48
#3  0x00007f4e8e299d8e in __memp_sync_int (env=0x2616fa0, dbmfp=0x0, trickle_max=0, flags=4, wrote_totalp=0x0, interruptedp=0x0) at ../src/mp/mp_sync.c:483
#4  0x00007f4e8e2b1a7a in __txn_checkpoint (env=0x2616fa0, kbytes=0, minutes=0, flags=0) at ../src/txn/txn_chkpt.c:242
#5  0x00007f4e8e2b14fd in __txn_checkpoint_pp (dbenv=0x26ecf90, kbytes=0, minutes=0, flags=0) at ../src/txn/txn_chkpt.c:81
#6  0x00007f4e8bf3aaa7 in checkpoint_threadmain (param=<optimized out>) at /home/mareynol/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/dblayer.c:4769
#7  0x00007f4e95ad7c2b in _pt_root (arg=0x26f65f0) at ../../../nspr/pr/src/pthreads/ptthread.c:212
#8  0x00007f4e95477ee5 in start_thread (arg=0x7f4e87a3f700) at pthread_create.c:309
#9  0x00007f4e951a6d1d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

reproducer script - needs to be run from ds/dirsrvtests/
reliab7.5_test.py

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.6 backlog

3 years ago

Can we still reproduce this issue @mreynolds ?

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to new
- Issue close_status updated to: None

3 years ago

Can we still reproduce this issue @mreynolds ?

Yes this is still a problem, but some what of a corner case...

You edit dirsrvtests/tests/stress/reliability/reliab_7_5_test.py and change "rename_s" to "rename". Then run it, and the server will get deadlocked in less than 10 passes.

It does not hang if we use "rename_s" as it waits for the result (removing the timing condition).

This issue seems to be coming up even in production.
We are tracking a very similar issue downstream for libdb on RHEL-based systems over here:
https://bugzilla.redhat.com/show_bug.cgi?id=1349779
I am however having a hard time to reproduce this by running the test case as it generally takes a lot of passes for the deadlock to show up on my box (definitely not less than 10).

Metadata Update from @mreynolds:
- Custom field reviewstatus reset (from new)
- Issue set to the milestone: 1.3.7 backlog (was: 1.3.6 backlog)

3 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4 backlog (was: 1.3.7 backlog)

3 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to None
- Issue priority set to: normal (was: major)
- Issue set to the milestone: 1.4.5 (was: 1.4 backlog)

5 months ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1497

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

a month ago

Login to comment on this ticket.

Metadata