#48165 Lock not released of an entry in the entry cache
Closed: wontfix None Opened 9 years ago by tbordaz.

A bind operation tries to update the target entry. To do this it triggers an internal modify that tries to lock the entry into the entry cache.

The entry lock, is already acquired (so the bind operation hang). But the owner thread is processing an other operation and can not own the lock.

This is looking like an entry was locked in entry cache during an operation. The operation completed (successfully or not) but the entry lock was not released


The problem happened when running ipa tests.
IPA version is development version (master+user_life_cycle patches, master HEAD is
'''b48cfe0''' ''ldap: Remove IPASimpleLDAPObject'')

DS version is a development version (master+1209573. master HEAD is '''d61a073''' ''Tests to support ticket 48026'')

Hang is possibly dynamic. I ran the attached test script (demo_ulc.sh) several times.

Hang description: Thread 24 is trying to update an entry ("uid=stageadm,cn=users,cn=accounts,SUFFIX") during a bind. The bind is successfull. The owner of the entry (stageadm) in the entry cache is thread 15 that is doing a MOD on a completely different entry (fqdn=vm-yyy.xxx,cn=computers,cn=accounts,SUFFIX" It is like thread 15 did and completed an operation that did not release the entry lock. And now thread 15 is processing a different operation. {{{ (gdb) thread 24 [Switching to thread 24 (Thread 0x7f97a07f8700 (LWP 20310))] #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) where #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f97c51386b3 in PR_EnterMonitor (mon=0x7f97800cfea0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592 #2 0x00007f97baf0ca5a in cache_lock_entry (cache=cache@entry=0x7f97c95a2b38, e=0x7f9780085f70) at ldap/servers/slapd/back-ldbm/cache.c:1548 #3 0x00007f97baf1dd65 in find_entry_internal_dn (flags=0, txn=0x7f97a07f3320, lock=1, sdn=0x7f97800d0450, be=0x7f97c96c1c30, pb=0x7f97800cf9e0) at ldap/servers/slapd/back-ldbm/findentry.c:155 #4 find_entry_internal (pb=pb@entry=0x7f97800cf9e0, be=0x7f97c96c1c30, lock=lock@entry=1, txn=txn@entry=0x7f97a07f3320, flags=0, addr=<optimized out>, addr=<optimized out>) at ldap/servers/slapd/back-ldbm/findentry.c:293 #5 0x00007f97baf1e03c in find_entry2modify (pb=pb@entry=0x7f97800cf9e0, be=<optimized out>, addr=<optimized out>, txn=txn@entry=0x7f97a07f3320) at ldap/servers/slapd/back-ldbm/findentry.c:324 #6 0x00007f97baf51f86 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:576 #7 0x00007f97c6d45035 in op_shared_modify (pb=pb@entry=0x7f97800cf9e0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086 #8 0x00007f97c6d45ad7 in modify_internal_pb (pb=pb@entry=0x7f97800cf9e0) at ldap/servers/slapd/modify.c:631 #9 0x00007f97c6d465e3 in slapi_modify_internal_pb (pb=pb@entry=0x7f97800cf9e0) at ldap/servers/slapd/modify.c:486 #10 0x00007f97bc5f7bc3 in ipalockout_postop (pb=0x7f97a07f7b20) at ipa_lockout.c:630 #11 0x00007f97c6d56abf in plugin_call_func (list=0x7f97c950f930, operation=operation@entry=501, pb=pb@entry=0x7f97a07f7b20, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1949 #12 0x00007f97c6d56de3 in plugin_call_list (pb=0x7f97a07f7b20, operation=501, list=<optimized out>) at ldap/servers/slapd/plugin.c:1893 #13 plugin_call_plugins (pb=pb@entry=0x7f97a07f7b20, whichfunction=whichfunction@entry=501) at ldap/servers/slapd/plugin.c:467 #14 0x00007f97c722328d in do_bind (pb=pb@entry=0x7f97a07f7b20) at ldap/servers/slapd/bind.c:424 #15 0x00007f97c72299d4 in connection_dispatch_operation (pb=0x7f97a07f7b20, op=0x7f97c9a44410, conn=0x7f97b404e130) at ldap/servers/slapd/connection.c:635 #16 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #17 0x00007f97c513dcab in _pt_root (arg=0x7f97c9a8d0e0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #18 0x00007f97c4ade52a in start_thread () from /lib64/libpthread.so.0 #19 0x00007f97c481a79d in clone () from /lib64/libc.so.6 (gdb) print *pb->pb_op->o_params.p.p_modify.modify_mods[0] $10 = {mod_op = 130, mod_type = 0x7f97800526b0 "krbLastSuccessfulAuth", mod_vals = {modv_strvals = 0x7f9780080090, modv_bvals = 0x7f9780080090}} (gdb) print **pb->pb_op->o_params.p.p_modify.modify_mods[0].mod_vals.modv_bvals $16 = {bv_len = 15, bv_val = 0x7f9780086360 "20150422172417Z"} (gdb) frame 2 #2 0x00007f97baf0ca5a in cache_lock_entry (cache=cache@entry=0x7f97c95a2b38, e=0x7f9780085f70) at ldap/servers/slapd/back-ldbm/cache.c:1548 1548 PR_EnterMonitor(e->ep_mutexp); (gdb) print e->ep_mutexp->owner $20 = 140289133967104 -> 0x7f979bfef700 Thread 15 (Thread 0x7f979bfef700 (LWP 20319)): #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f97c51386b3 in PR_EnterMonitor (mon=0x7f97c95a4b80) at ../../../nspr/pr/src/pthreads/ptsynch.c:592 #2 0x00007f97baf11612 in dblayer_lock_backend (be=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:3978 #3 0x00007f97baf165b6 in dblayer_txn_begin (be=0x7f97c96c1c30, parent_txn=0x0, txn=txn@entry=0x7f979bfea6f0) at ldap/servers/slapd/back-ldbm/dblayer.c:3697 #4 0x00007f97baf51995 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:560 #5 0x00007f97c6d45035 in op_shared_modify (pb=pb@entry=0x7f979bfeeb20, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086 #6 0x00007f97c6d46337 in do_modify (pb=pb@entry=0x7f979bfeeb20) at ldap/servers/slapd/modify.c:419 #7 0x00007f97c7229985 in connection_dispatch_operation (pb=0x7f979bfeeb20, op=0x7f97c9b8f6d0, conn=0x7f97b4051310) at ldap/servers/slapd/connection.c:660 #8 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #9 0x00007f97c513dcab in _pt_root (arg=0x7f97c9aaf5e0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #10 0x00007f97c4ade52a in start_thread () from /lib64/libpthread.so.0 #11 0x00007f97c481a79d in clone () from /lib64/libc.so.6 }}}

Error logs contains some errors (possibly related to the previous operation done by thread 15):

{{{
[22/Apr/2015:19:14:35 +0200] - Parent cn=groups,cn=accounts,SUFFIX has no children. (op 0x2, repl_op 0x10)
[22/Apr/2015:19:14:36 +0200] ldbm_back_delete - conn=335 op=8 parent_update_on_childchange: old_entry=0x7f97b0047f80, new_entry=0x0, rc=-1
[22/Apr/2015:19:19:53 +0200] - Parent cn=groups,cn=accounts,SUFFIX has no children. (op 0x2, repl_op 0x0)
[22/Apr/2015:19:19:53 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7f97882bfe70, new_entry=0x0, rc=-1
[22/Apr/2015:19:19:53 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,SUFFIX) - error (1)
[22/Apr/2015:19:19:53 +0200] - Parent cn=groups,cn=accounts,SUFFIX has no children. (op 0x2, repl_op 0x0)
[22/Apr/2015:19:19:53 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7f97882bfe70, new_entry=0x0, rc=-1
[22/Apr/2015:19:19:53 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,SUFFIX) - error (1)
[22/Apr/2015:19:24:01 +0200] referint-plugin - _update_all_per_mod: entry cn=ipausers,cn=groups,cn=accounts,SUFFIX: deleting "member: uid=my_manager,cn=users,cn=accounts,SUFFIX" failed (16)
[22/Apr/2015:19:24:01 +0200] - SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN plugin returned error code but did not set SLAPI_RESULT_CODE
[22/Apr/2015:19:24:01 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,SUFFIX) - error (32)
[22/Apr/2015:19:24:17 +0200] DSRetroclPlugin - replog: an error occured while adding change number 1343, dn = changenumber=1343,cn=changelog: Already exists.
[22/Apr/2015:19:24:17 +0200] retrocl-plugin - retrocl_postob: operation failure [68]
[22/Apr/2015:19:34:14 +0200] DSRetroclPlugin - replog: an error occured while adding change number 1343, dn = changenumber=1343,cn=changelog: Already exists.
[22/Apr/2015:19:34:14 +0200] retrocl-plugin - retrocl_postob: operation failure [68]
}}}

The hang was detected at '''22/Apr/2015:19:34:14''' (*mods of the bind 20150422172417Z") but the failure (lock not release) happened before

The test case (demo_ulc.sh) can reproduce (1 time out of 10) this kind of error

{{{
[23/Apr/2015:16:18:32 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x10)
[23/Apr/2015:16:18:32 +0200] ldbm_back_delete - conn=507 op=8 parent_update_on_childchange: old_entry=0x7fb01c08dec0, new_entry=0x0, rc=-1
[23/Apr/2015:16:21:14 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x0)
[23/Apr/2015:16:21:14 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7fb034072d30, new_entry=0x0, rc=-1
[23/Apr/2015:16:21:14 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com) - error (1)
[23/Apr/2015:16:21:14 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x0)
[23/Apr/2015:16:21:14 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7fb034072d30, new_entry=0x0, rc=-1
[23/Apr/2015:16:21:14 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com) - error (1)

[23/Apr/2015:16:18:32 +0200] conn=507 op=8 DEL dn="cn=my_group,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com"
[23/Apr/2015:16:18:32 +0200] conn=507 op=8 RESULT err=1 tag=107 nentries=0 etime=0
...
[23/Apr/2015:16:21:13 +0200] conn=530 op=20 DEL dn="uid=my_manager,cn=users,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com"
[23/Apr/2015:16:21:14 +0200] conn=530 op=20 RESULT err=1 tag=107 nentries=0 etime=1

}}}

Replying to [comment:5 tbordaz]: > > > The test case (demo_ulc.sh) can reproduce (1 time out of 10) this kind of error > > {{{ > [23/Apr/2015:16:18:32 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x10) > [23/Apr/2015:16:18:32 +0200] ldbm_back_delete - conn=507 op=8 parent_update_on_childchange: old_entry=0x7fb01c08dec0, new_entry=0x0, rc=-1 > [23/Apr/2015:16:21:14 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x0) > [23/Apr/2015:16:21:14 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7fb034072d30, new_entry=0x0, rc=-1 > [23/Apr/2015:16:21:14 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com) - error (1) > [23/Apr/2015:16:21:14 +0200] - Parent cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com has no children. (op 0x2, repl_op 0x0) > [23/Apr/2015:16:21:14 +0200] ldbm_back_delete - conn=0 op=0 parent_update_on_childchange: old_entry=0x7fb034072d30, new_entry=0x0, rc=-1 > [23/Apr/2015:16:21:14 +0200] managed-entries-plugin - mep_del_post_op: failed to delete managed entry (cn=my_manager,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com) - error (1) > > > > [23/Apr/2015:16:18:32 +0200] conn=507 op=8 DEL dn="cn=my_group,cn=groups,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" > [23/Apr/2015:16:18:32 +0200] conn=507 op=8 RESULT err=1 tag=107 nentries=0 etime=0 > ... > [23/Apr/2015:16:21:13 +0200] conn=530 op=20 DEL dn="uid=my_manager,cn=users,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" > [23/Apr/2015:16:21:14 +0200] conn=530 op=20 RESULT err=1 tag=107 nentries=0 etime=1 > > > }}} Then when trying to uninstall ipa, I hit the hang: {{{ Thread 3 (Thread 0x7fb0267e4700 (LWP 32187)): #0 0x00007fb054acb590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fb0551206b3 in PR_EnterMonitor (mon=0x7fb0101b18b0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592 #2 0x00007fb04aef4a5a in cache_lock_entry (cache=cache@entry=0x7fb058190108, e=0x7fb010185da0) at ldap/servers/slapd/back-ldbm/cache.c:1548 #3 0x00007fb04af05d65 in find_entry_internal_dn (flags=0, txn=0x7fb0267df320, lock=1, sdn=0x7fb01c266560, be=0x7fb058187e50, pb=0x7fb01c079cd0) at ldap/servers/slapd/back-ldbm/findentry.c:155 #4 find_entry_internal (pb=pb@entry=0x7fb01c079cd0, be=0x7fb058187e50, lock=lock@entry=1, txn=txn@entry=0x7fb0267df320, flags=0, addr=<optimized out>, addr=<optimized out>) at ldap/servers/slapd/back-ldbm/findentry.c:293 #5 0x00007fb04af0603c in find_entry2modify (pb=pb@entry=0x7fb01c079cd0, be=<optimized out>, addr=<optimized out>, txn=txn@entry=0x7fb0267df320) at ldap/servers/slapd/back-ldbm/findentry.c:324 #6 0x00007fb04af39f86 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:576 #7 0x00007fb056d2d035 in op_shared_modify (pb=pb@entry=0x7fb01c079cd0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086 #8 0x00007fb056d2dad7 in modify_internal_pb (pb=pb@entry=0x7fb01c079cd0) at ldap/servers/slapd/modify.c:631 #9 0x00007fb056d2e5e3 in slapi_modify_internal_pb (pb=pb@entry=0x7fb01c079cd0) at ldap/servers/slapd/modify.c:486 #10 0x00007fb04c5dfbc3 in ipalockout_postop (pb=0x7fb0267e3b20) at ipa_lockout.c:630 #11 0x00007fb056d3eabf in plugin_call_func (list=0x7fb0580f6a10, operation=operation@entry=501, pb=pb@entry=0x7fb0267e3b20, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1949 #12 0x00007fb056d3ede3 in plugin_call_list (pb=0x7fb0267e3b20, operation=501, list=<optimized out>) at ldap/servers/slapd/plugin.c:1893 #13 plugin_call_plugins (pb=pb@entry=0x7fb0267e3b20, whichfunction=whichfunction@entry=501) at ldap/servers/slapd/plugin.c:467 #14 0x00007fb05720b28d in do_bind (pb=pb@entry=0x7fb0267e3b20) at ldap/servers/slapd/bind.c:424 #15 0x00007fb0572119d4 in connection_dispatch_operation (pb=0x7fb0267e3b20, op=0x7fb0582d6570, conn=0x7fb0440bb800) at ldap/servers/slapd/connection.c:635 #16 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #17 0x00007fb055125cab in _pt_root (arg=0x7fb05860ecd0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #18 0x00007fb054ac652a in start_thread () from /lib64/libpthread.so.0 #19 0x00007fb05480279d in clone () from /lib64/libc.so.6 }}} So when the error occurs, very likely a bind (admin or stageadm) will hang

The problem is around the numsubordinates attributes. The failure occurs because when deleting an entry and updating the parent's numbsubordinates an error condition is detected (the parent is supposed to have no child). In fact we can see that this attribute value is invalid:

{{{
ldapsearch -LLL -D "cn=directory manager" -w Secret123 -b "cn=groups,cn=accounts,SUFFIX" -s base numsubordinates
dn: cn=groups,cn=accounts,SUFFIX
numsubordinates: 2

ldapsearch -D "cn=directory manager" -w Secret123 -b "cn=groups,cn=accounts,SUFFIX" -LLL dn
dn: cn=groups,cn=accounts,SUFFIX

dn: cn=admins,cn=groups,cn=accounts,SUFFIX

dn: cn=ipausers,cn=groups,cn=accounts,SUFFIX

dn: cn=editors,cn=groups,cn=accounts,SUFFIX

dn: cn=trust admins,cn=groups,cn=accounts,SUFFIX

dn: cn=my_group,cn=groups,cn=accounts,SUFFIX

dn: cn=stageadm,cn=groups,cn=accounts,SUFFIX

dn: cn=my_manager,cn=groups,cn=accounts,SUFFIX

[24/Apr/2015:11:49:59 +0200] - Parent cn=groups,cn=accounts,SUFFIX has no children. (op 0x2, repl_op 0x10)
[24/Apr/2015:11:49:59 +0200] ldbm_back_delete - conn=594 op=8 parent_update_on_childchange: old_entry=0x7f07600c45d0, new_entry=0x0, rc=-1
[24/Apr/2015:11:54:55 +0200] - Parent cn=groups,cn=accounts,SUFFIX has no children. (op 0x2, repl_op 0x10)
[24/Apr/2015:11:54:55 +0200] ldbm_back_delete - conn=627 op=8 parent_update_on_childchange: old_entry=0x7f07841919f0, new_entry=0x0, rc=-1

}}}

This bug was very likely a duplicate of https://fedorahosted.org/389/ticket/47978, where a thread could forget to unlock an entry in entry cache.

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: 1.3.5 backlog

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1496

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Duplicate)

3 years ago

Login to comment on this ticket.

Metadata