#5549 Slapi-nis deadlock
Closed: Invalid None Opened 8 years ago by tbordaz.

The version are:
Fedora 23
freeipa freeipa-server-dns-4.2.90-20151211063743Zjenkins50gitccb2f52.fc23.x86_64
389-ds 389-ds-base-1.3.4.5-1.fc23.x86_64

The deadlock should not be systematic but may be a reproducible test case will be provided later.

- This is a deadlock in slapi-nis with two threads acquiring locks in the opposite order
  Update main suffix (acquire DB page lock) -> update slapi-nis map (map lock in write)
  Update config (acquire slapi-nis map lock write) -> search slapi-nis mapped members of the updated config entry (need DB page lock)

- Details of the DB locks are


Thread 18 (Thread 0x7fab86ff5700 (LWP 27687)):
80000062 dd= 4 locks held 51   write locks 27   pid/thread 27660/140374681016064 (7FAB86FF5700) flags 0    priority 100
80000062 WRITE         1 HELD    userRoot/numsubordinates.db page          1
...
80000062 WRITE         5 HELD    userRoot/objectclass.db   page          5
80000062 READ          2 HELD    userRoot/objectclass.db   page          5


Thread 12 (Thread 0x7fab83fef700 (LWP 27693)):
      2b dd= 0 locks held 0    write locks 0    pid/thread 27660/140374630659840 (7FAB83FEF700) flags 0    priority 100
      2b READ          1 WAIT    userRoot/objectclass.db   page          5




- Details on the threads



Thread 18 (Thread 0x7fab86ff5700 (LWP 27687)):
#4  0x00007fabb49af3aa in slapi_rwlock_wrlock (rwlock=<optimized out>) at ldap/servers/slapd/slapi2nspr.c:238
#5  0x00007faba648fd28 in wrap_rwlock_wrlock (rwlock=<optimized out>) at wrap.c:238
#6  0x00007faba648f3bc in map_wrlock () at map.c:1230
#7  0x00007faba648021b in backend_shr_delete_cb (pb=pb@entry=0x7fab86ff4b00) at back-shr.c:2496
#9  backend_shr_betxn_post_delete_cb (pb=0x7fab86ff4b00) at back-shr.c:2548
#12 plugin_call_plugins (pb=pb@entry=0x7fab86ff4b00, whichfunction=whichfunction@entry=563) at ldap/servers/slapd/plugin.c:438
#13 0x00007faba823f8a5 in ldbm_back_delete (pb=0x7fab86ff4b00) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:1212
#14 0x00007fabb4940000 in op_shared_delete (pb=pb@entry=0x7fab86ff4b00) at ldap/servers/slapd/delete.c:333
#15 0x00007fabb49402c3 in do_delete (pb=pb@entry=0x7fab86ff4b00) at ldap/servers/slapd/delete.c:97

It deletes the SUFFIX entry:

"cn=dom-223.ipa.test-to-dom-<REALM>,cn=domain,cn=topology,cn=ipa,cn=etc,dc=ipa,dc=test"
rdn: cn=dom-223.ipa.test
objectClass;vucsn-566a924f000000030000: top
objectClass;vucsn-566a924f000000030000: nsContainer
objectClass;vucsn-566a924f000000030000: ipaReplTopoManagedServer
objectClass;vucsn-566a924f000000030000: ipaConfigObject
objectClass;vucsn-566a924f000000030000: ipaSupportedDomainLevelConfig
cn;vucsn-566a924f000000030000;mdcsn-566a924f000000030000: dom-223.ipa.test
ipaReplTopoManagedSuffix;vucsn-566a924f000000030000: dc=ipa,dc=test
ipaMinDomainLevel;vucsn-566a924f000000030000: 0
ipaMaxDomainLevel;vucsn-566a924f000000030000: 1
creatorsName;vucsn-566a924f000000030000: cn=directory manager
modifiersName;vucsn-566a924f000000030000: cn=directory manager
createTimestamp;vucsn-566a924f000000030000: 20151211090726Z
modifyTimestamp;vucsn-566a924f000000030000: 20151211090726Z
nsUniqueId: 7c37e39b-9fe611e5-98c8ce6e-61903fe5
parentid: 25
entryid: 428
entryusn: 7
numSubordinates: 1

That acquires the writes lock on page 5 (containing the objectclass 'eq' indexes)

page 5: btree leaf: LSN [1][393914]: level 1
[404] 6756 len:  26 data: =iparepltopomanagedserver\0
[406] 6756 len:  26 data: =iparepltopomanagedserver\0
[438] 6580 len:  31 data: =ipasupporteddomainlevelconfig\0
[440] 6580 len:  31 data: =ipasupporteddomainlevelconfig\0
[660] 5580 len:  13 data: =nscontainer\0
...
[830] 5580 len:  13 data: =nscontainer\0
[886] 4632 len:   5 data: =top\0

Then tries to acquire the slapi-nis map lock (write) to update the map with the deleted entry

The second thread is:

Thread 12 (Thread 0x7fab83fef700 (LWP 27693)):
#3  0x00007fabac6c352f in __db_tas_mutex_lock_int (nowait=0, timeout=0, mutex=2371, env=<optimized out>) at ../../src/mutex/mut_tas.c:255
#6  0x00007fabac76e727 in __lock_get (env=env@entry=0x55e1cc72fb80, locker=0x7faba47475d8, flags=0, obj=obj@entry=0x55e1cc71f680, lock_mode=lock_mode@entry=DB_LOCK_READ, lock=lock@entry=0x7fab83fda9a0) at ../../src/lock/lock.c:469
#7  0x00007fabac79a4c7 in __db_lget (dbc=dbc@entry=0x55e1cc71f590, action=action@entry=0, pgno=5, mode=mode@entry=DB_LOCK_READ, lkflags=lkflags@entry=0, lockp=lockp@entry=0x7fab83fda9a0) at ../../src/db/db_meta.c:1257
#8  0x00007fabac6e1222 in __bam_search (dbc=dbc@entry=0x55e1cc71f590, root_pgno=1, root_pgno@entry=0, key=key@entry=0x7fab83fdac70, flags=flags@entry=1409, slevel=slevel@entry=1, recnop=recnop@entry=0x0, exactp=0x7fab83fdaac4) at ../../src/btree/bt_search.c:723
#9  0x00007fabac6cbf0c in __bamc_search (dbc=dbc@entry=0x55e1cc71f590, root_pgno=root_pgno@entry=0, key=0x7fab83fdac70, flags=<optimized out>, exactp=<optimized out>) at ../../src/btree/bt_cursor.c:2804
#10 0x00007fabac6ce02f in __bamc_get (dbc=0x55e1cc71f590, key=<optimized out>, data=<optimized out>, flags=<optimized out>, pgnop=0x7fab83fdab54) at ../../src/btree/bt_cursor.c:1099
#13 0x00007fabac795ef2 in __dbc_get_pp (dbc=0x7fab3000ab30, key=0x7fab83fdac70, data=0x7fab83fdaca0, flags=2074) at ../../src/db/db_iface.c:2361
#14 0x00007faba821da2e in idl_new_fetch (be=0x55e1cc586d30, db=0x55e1cc6fda50, inkey=0x7fab83fdce00, txn=0x0, a=0x55e1cc71a950, flag_err=0x7fab83fe3b4c, allidslimit=4000) at ldap/servers/slapd/back-ldbm/idl_new.c:202
#16 0x00007faba822c063 in index_read_ext_allids (pb=pb@entry=0x7fab2800d0c0, be=be@entry=0x55e1cc586d30, type=type@entry=0x7fab2800d500 "objectclass", indextype=indextype@entry=0x7faba827305f "eq", val=<optimized out>, txn=txn@entry=0x7fab83fe10e0, err=0x7fab83fe3b4c, unindexed=0x7fab83fe10d4, allidslimit=4000) at ldap/servers/slapd/back-ldbm/index.c:1035
#17 0x00007faba82165bd in keys2idl (pb=pb@entry=0x7fab2800d0c0, be=be@entry=0x55e1cc586d30, type=0x7fab2800d500 "objectclass", indextype=indextype@entry=0x7faba827305f "eq", ivals=0x7fab83fe11c0, err=err@entry=0x7fab83fe3b4c, unindexed=0x7fab83fe10d4, txn=0x7fab83fe10e0, allidslimit=4000) at ldap/servers/slapd/back-ldbm/filterindex.c:977
#19 0x00007faba82173ba in filter_candidates_ext (pb=pb@entry=0x7fab2800d0c0, be=be@entry=0x55e1cc586d30, base=base@entry=0x7fab28008700 "cn=groups,cn=accounts,dc=ipa,dc=test", f=f@entry=0x7fab2800db20, nextf=nextf@entry=0x0, range=range@entry=0, err=0x7fab83fe3b4c, allidslimit=4000) at ldap/servers/slapd/back-ldbm/filterindex.c:111
#20 0x00007faba82183bf in list_candidates (pb=pb@entry=0x7fab2800d0c0, be=be@entry=0x55e1cc586d30, base=base@entry=0x7fab28008700 "cn=groups,cn=accounts,dc=ipa,dc=test", flist=flist@entry=0x7fab2800d390, ftype=161, err=0x7fab83fe3b4c, allidslimit=4000) at ldap/servers/slapd/back-ldbm/filterindex.c:808
#25 0x00007faba8254164 in onelevel_candidates (err=0x7fab83fe3b4c, lookup_returned_allidsp=0x7fab83fe3b3c, managedsait=<optimized out>, filter=<optimized out>, e=0x55e1cc7331f0, base=0x7fab28008700 "cn=groups,cn=accounts,dc=ipa,dc=test", be=0x55e1cc586d30, pb=0x7fab2800d0c0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1143
#26 build_candidate_list (candidates=0x7fab83fe3b78, lookup_returned_allidsp=0x7fab83fe3b3c, scope=<optimized out>, base=0x7fab28008700 "cn=groups,cn=accounts,dc=ipa,dc=test", e=<optimized out>, be=0x55e1cc586d30, pb=0x7fab2800d0c0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1010
#30 0x00007fabb4991b9d in slapi_search_internal_callback_pb (pb=pb@entry=0x7fab2800d0c0, callback_data=callback_data@entry=0x7fab83fea2e0, prc=prc@entry=0x0, psec=psec@entry=0x7faba647feb0 <backend_shr_note_entry_sdn_cb>, prec=prec@entry=0x0) at ldap/servers/slapd/plugin_internal_op.c:564
#31 0x00007faba64816c7 in backend_shr_update_references_cb (group=0x55e1cc703f70 "cn=compat,dc=ipa,dc=test", set=0x55e1cc73ae60 "cn=groups", flag=<optimized out>, backend_data=0x55e1cc729fb0, cbdata_ptr=0x7fab83fea3d0) at back-shr.c:1369
#32 0x00007faba648de9f in map_data_foreach_map (state=state@entry=0x55e1cc563aa0, domain_name=domain_name@entry=0x0, fn=fn@entry=0x7faba6480d40 <backend_shr_update_references_cb>, cbdata=cbdata@entry=0x7fab83fea3d0) at map.c:347
#33 0x00007faba647f89b in backend_shr_update_references (state=0x55e1cc563aa0, pb=pb@entry=0x7fab83feeb00, e=<optimized out>, mods=mods@entry=0x0, modlist=modlist@entry=0x0) at back-shr.c:1655
#34 0x00007faba6480806 in backend_shr_add_cb (pb=pb@entry=0x7fab83feeb00) at back-shr.c:1778
#36 backend_shr_betxn_post_add_cb (pb=0x7fab83feeb00) at back-shr.c:1806
#39 plugin_call_plugins (pb=pb@entry=0x7fab83feeb00, whichfunction=whichfunction@entry=560) at ldap/servers/slapd/plugin.c:438
#40 0x00007fabb4949932 in dse_add (pb=0x7fab83feeb00) at ldap/servers/slapd/dse.c:2403
#41 0x00007fabb4932c8a in op_shared_add (pb=pb@entry=0x7fab83feeb00) at ldap/servers/slapd/add.c:714
#42 0x00007fabb4934008 in do_add (pb=pb@entry=0x7fab83feeb00) at ldap/servers/slapd/add.c:226

Add the 'ipa_pwd_extop' plugin config entry
ADD "cn=ipa_pwd_extop,cn=plugins,cn=config"

That triggers an internal search while holding (wlock slapi-nis map)
backend_shr_update_references
-s one
-b "cn=groups,cn=accounts,dc=ipa,dc=test"
"(member=cn=ipa_pwd_extop,cn=plugins,cn=config)"

But this search also needs to access the DB page 5 that is already acquired by Thread 18.

- Config entry "cn=ipa_pwd_extop,cn=plugins,cn=config" should be excluded from the
  scope of the compat schema plugin.
  In fact this attribute looks missing from 'ng', 'groups', 'computers' and 'cn=sudoers'.
  This is the missing configuration:
    schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

No identified reproducible test case for that ticket

In conclusion from https://fedorahosted.org/freeipa/ticket/5549#comment:2, this deadlock should not occur if schema-compat configuration (dse.ldif) has the definition:

schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

The reason why this definition is missing is unknown.

Closing given that it happens very rarely. Please re-open if it happens again and more info is known - e.g. reproducer.

Metadata Update from @tbordaz:
- Issue assigned to someone
- Issue set to the milestone: 0.0 NEEDS_TRIAGE

7 years ago

Login to comment on this ticket.

Metadata