#4635 CI tests: deadlock in schema compat plugin (between automember_update_membership task and dse update)
Closed: Fixed None Opened 5 years ago by tbordaz.

The deadlock occurs while testing freeipa unit-test (make-tests) in particular in test_automember).

freeipa 4.0.3 branch and 389-ds master branch (CI tests).

repoquery -i freeipa-server

Name        : freeipa-server
Version     : 4.0.3GITb89c184
Release     : 0.fc20
Architecture: x86_64
Size        : 4036345
Packager    : None
Group       : System Environment/Base
URL         : http://www.freeipa.org/
License     : GPLv3+
Repository  : tbordaz-freeIPA_40
Summary     : The IPA authentication server
Source      : freeipa-4.0.3GITb89c184-0.fc20.src.rpm
Description :
IPA is an integrated solution to provide centrally managed Identity (machine,
user, virtual machines, groups, authentication credentials), Policy
(configuration settings, access control information) and Audit (events,
logs, analysis thereof). If you are installing an IPA server you need
to install this package (in other words, most people should NOT install
this package).
[root@vm-043 db]# repoquery -i 389-ds-base

Name        : 389-ds-base
Version     : 2014_10_16
Release     : 1.fc20
Architecture: x86_64
Size        : 5489313
Packager    : None
Group       : System Environment/Daemons
URL         : http://port389.org/
License     : GPLv2 with exceptions
Repository  : mreynolds-389-ds-base
Summary     : 389 Directory Server (base)
Source      : 389-ds-base-2014_10_16-1.fc20.src.rpm
Description :
389 Directory Server is an LDAPv3 compliant server.  The base package includes
the LDAP server and command line utilities for server administration.

The deadlock occurs twice during the tests.


The deadlock condition is the following:

Thread 11 is holding the schema-compat map lock (backend_shr_add_cb/map_wrlock).
Thread  2 is waiting the schema-compat map lock (backend_shr_modify_cb/map_wrlock).

At the same time 
Thread  2 is holding the member.db page (#18) lock 
Thread 11 is waiting the member.db page (#18) lock


Thread 11 (Thread 0x7fc90afed700 (LWP 18394)):
#0  0x00007fc936284d20 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc9307dd1c3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007fc9307dc5a8 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007fc930886fda in __lock_get_internal () from /lib64/libdb-5.3.so
#4  0x00007fc930887ac0 in __lock_get () from /lib64/libdb-5.3.so
#5  0x00007fc9308b36da in __db_lget () from /lib64/libdb-5.3.so
#6  0x00007fc9307fa4a7 in __bam_search () from /lib64/libdb-5.3.so
#7  0x00007fc9307e5126 in __bamc_search () from /lib64/libdb-5.3.so
#8  0x00007fc9307e6bdf in __bamc_get () from /lib64/libdb-5.3.so
#9  0x00007fc9308a0156 in __dbc_iget () from /lib64/libdb-5.3.so
#10 0x00007fc9308af0b4 in __dbc_get_pp () from /lib64/libdb-5.3.so
#11 0x00007fc92cbae1f0 in idl_new_fetch () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#12 0x00007fc92cbbc656 in index_read_ext_allids () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#13 0x00007fc92cba6e44 in keys2idl () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#14 0x00007fc92cba75a3 in ava_candidates.isra () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#15 0x00007fc92cba7b92 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#16 0x00007fc92cba8c06 in list_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#17 0x00007fc92cba7b00 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#18 0x00007fc92cba8c06 in list_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#19 0x00007fc92cba7b00 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#20 0x00007fc92cba91fa in filter_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#21 0x00007fc92cbe471e in ldbm_back_search () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#22 0x00007fc9384e5a89 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0
#23 0x00007fc9384f5d0e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
#24 0x00007fc92ae23c01 in backend_shr_update_references_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#25 0x00007fc92ae3117f in map_data_foreach_map () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#26 0x00007fc92ae21a2b in backend_shr_update_references () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#27 0x00007fc92ae22b86 in backend_shr_add_cb.part () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#28 0x00007fc92ae22ce1 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#29 0x00007fc9384f1d30 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#30 0x00007fc9384f1f88 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#31 0x00007fc9384af30a in dse_add () from /usr/lib64/dirsrv/libslapd.so.0
#32 0x00007fc938498cba in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#33 0x00007fc93849a000 in do_add () from /usr/lib64/dirsrv/libslapd.so.0
#34 0x00007fc9389be184 in connection_threadmain ()
#35 0x00007fc9368e0e5b in _pt_root () from /lib64/libnspr4.so
#36 0x00007fc936280f33 in start_thread () from /lib64/libpthread.so.0
#37 0x00007fc935faeded in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fc9067e4700 (LWP 18927)):
#0  0x00007fc93628468e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00007fc92ae24b70 in backend_shr_modify_cb.part () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007fc92ae25311 in backend_shr_betxn_post_modify_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007fc9384f1d30 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007fc9384f1f88 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007fc92cbdcc49 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#6  0x00007fc9384df021 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007fc9384dfad4 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007fc92f4c8975 in automember_add_member_value () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#9  0x00007fc92f4c8e13 in automember_update_membership () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#10 0x00007fc92f4c95aa in automember_rebuild_task_thread () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#11 0x00007fc9368e0e5b in _pt_root () from /lib64/libnspr4.so
#12 0x00007fc936280f33 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fc935faeded in clone () from /lib64/libc.so.6


Thread 11
      3f dd=123 locks held 0    write locks 0    pid/thread 18358/140501449627392 flags 0    priority 100
      3f READ          1 WAIT    userRoot/member.db        page         18

Thread 2
80000f45 dd= 0 locks held 46   write locks 42   pid/thread 18358/140501374093056 flags 0    priority 100
80000f45 READ          1 HELD    changelog/nsuniqueid.db   page          6
80000f45 READ          3 HELD    changelog/entryrdn.db     page         28
...
80000f45 WRITE         2 HELD    userRoot/member.db        page         18
...

At a first look, it seems that automember_update_membership task may acquire the map lock in the opposite order than regular schema-compat postop plugin as it already owns some DB lock

I assume you are investigating it - if possible it would be great to resolve it in the 4.1 time frame.

The schema of the deadlock is locks taken into the opposite order.
Thread 2, creates txn and acquired a DB page lock (for its update), then a betxn-post-plugin is hanging to acquire a private plugin lock (map).
Thread 11, processed an update and the same betxn-post-plugin acquire the private plugin lock (map). Then it hang during an internal search because it needs access on the DB page lock that Thread 2 is holding.

I wonder, if the internal search should not be done with the parent txn. So that the DB deadlock detection would abort the TXN. Currently the search is done with a new pblock.

A other (better?) option would be to acquire the map lock, on each retrieved entry from the internal search. Will discuss this with Alexander.

This deadlock situation happened after the fix:

86b5dce Ignore irrelevant subtrees in schema compat plugin

Before the fix, I was unable to reproduce but hit systematically the deadlock with this fix.

86b5dce fix excludes irrelevant subtrees from internal searches from schema plugin. Those internal searches were prone to create deadlock (https://fedorahosted.org/freeipa/ticket/4586 ).

So fix 86b5dce does not trigger this new deadlock but would rather reveal it.

Doing several tests with ipa-4-0 branch head with this fix, I reproduced the current ticket deadlock but also a similar one with retrocl (in automember):

Thread 2 (Thread 0x7f8ecf7fe700 (LWP 7722)):
#0  0x00007f8f136283f4 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x00007f8f081c31d6 in backend_write_cb.isra.2.part () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007f8f081c40dc in backend_betxn_pre_write_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007f8f15895d30 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007f8f15895f88 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007f8f09f6695e in ldbm_back_add () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#6  0x00007f8f1583ccba in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f8f1583d523 in add_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f8f087f5f0d in retrocl_postob () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#9  0x00007f8f15895d30 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#10 0x00007f8f15895f88 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#11 0x00007f8f09f81c49 in ldbm_back_modify () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#12 0x00007f8f15883021 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#13 0x00007f8f15883ad4 in modify_internal_pb () from /usr/lib64/dirsrv/libslapd.so.0
#14 0x00007f8f0c86d975 in automember_add_member_value () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#15 0x00007f8f0c86de13 in automember_update_membership () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#16 0x00007f8f0c86e5aa in automember_rebuild_task_thread () from /usr/lib64/dirsrv/plugins/libautomember-plugin.so
#17 0x00007f8f13c84e3b in _pt_root () from /lib64/libnspr4.so
#18 0x00007f8f13624ee5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f8f13353b8d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f8ee7fe7700 (LWP 7163)):
#0  0x00007f8f13628ca0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8f0db821c3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007f8f0db815a8 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007f8f0dc2bfda in __lock_get_internal () from /lib64/libdb-5.3.so
#4  0x00007f8f0dc2cac0 in __lock_get () from /lib64/libdb-5.3.so
#5  0x00007f8f0dc586da in __db_lget () from /lib64/libdb-5.3.so
#6  0x00007f8f0db9f4a7 in __bam_search () from /lib64/libdb-5.3.so
#7  0x00007f8f0db8a126 in __bamc_search () from /lib64/libdb-5.3.so
#8  0x00007f8f0db8bbdf in __bamc_get () from /lib64/libdb-5.3.so
#9  0x00007f8f0dc45156 in __dbc_iget () from /lib64/libdb-5.3.so
#10 0x00007f8f0dc540b4 in __dbc_get_pp () from /lib64/libdb-5.3.so
#11 0x00007f8f09f531f0 in idl_new_fetch () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#12 0x00007f8f09f61656 in index_read_ext_allids () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#13 0x00007f8f09f4be44 in keys2idl () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#14 0x00007f8f09f4c5a3 in ava_candidates.isra () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#15 0x00007f8f09f4cb92 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#16 0x00007f8f09f4dc06 in list_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#17 0x00007f8f09f4cb00 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#18 0x00007f8f09f4dc06 in list_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#19 0x00007f8f09f4cb00 in filter_candidates_ext () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#20 0x00007f8f09f4e1fa in filter_candidates () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#21 0x00007f8f09f8971e in ldbm_back_search () from /usr/lib64/dirsrv/plugins/libback-ldbm.so
#22 0x00007f8f15889a89 in op_shared_search () from /usr/lib64/dirsrv/libslapd.so.0
#23 0x00007f8f15899d0e in search_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0
#24 0x00007f8f081c7731 in backend_shr_update_references_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#25 0x00007f8f081d4d0f in map_data_foreach_map () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#26 0x00007f8f081c555b in backend_shr_update_references () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#27 0x00007f8f081c66b6 in backend_shr_add_cb.part () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#28 0x00007f8f081c6811 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#29 0x00007f8f15895d30 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#30 0x00007f8f15895f88 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#31 0x00007f8f1585330a in dse_add () from /usr/lib64/dirsrv/libslapd.so.0
#32 0x00007f8f1583ccba in op_shared_add () from /usr/lib64/dirsrv/libslapd.so.0
#33 0x00007f8f1583e000 in do_add () from /usr/lib64/dirsrv/libslapd.so.0
#34 0x00007f8f15d62184 in connection_threadmain ()
#35 0x00007f8f13c84e3b in _pt_root () from /lib64/libnspr4.so
#36 0x00007f8f13624ee5 in start_thread () from /lib64/libpthread.so.0
#37 0x00007f8f13353b8d in clone () from /lib64/libc.so.6

Thread 10
      40 dd=121 locks held 0    write locks 0    pid/thread 7126/140251754297088 flags 0    priority 100
      40 READ          1 WAIT    userRoot/member.db        page         29


Thread 2
80000e87 dd= 0 locks held 30   write locks 29   pid/thread 7126/140251343349504 flags 0    priority 100
...
80000e87 WRITE         1 HELD    userRoot/member.db        page         29

Also CCing Ludwig to be aware of the proposals and comments in this ticket.

The problem can be reproduce quite easily with automember unit tests. But slowing down the process with additional logs most of the time prevent it.

Schema plugin is doing a lot of internal searches during the automember task:

 base="cn=anonymous-limits,cn=etc,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" scope=0 filter="(|(objectclass=*)(objectclass=ldapsubentry))" 
 base="cn=b2b432e6-0490-479d-b3a9-5428ba815444,cn=automember rebuild membership,cn=tasks,cn=config" scope=0 filter="(objectclass=*)" 
 base="cn=config,cn=changelog,cn=ldbm database,cn=plugins,cn=config" scope=1 filter="objectclass=vlvsearch" 
 base="cn=config,cn=ipaca,cn=ldbm database,cn=plugins,cn=config" scope=1 filter="objectclass=vlvsearch" 
 base="cn=config,cn=userRoot,cn=ldbm database,cn=plugins,cn=config" scope=1 filter="objectclass=vlvsearch" 
 base="dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" scope=2 filter="(krbPrincipalName=admin@IDM.LAB.BOS.REDHAT.COM)" 
 base="uid=admin,cn=users,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com" scope=0 filter="(|(objectclass=*)(objectclass=ldapsubentry))"

I was unable to compute exactly the number, but it is

cn=config: ~150
main db: ~200
   uid=admin: ~90
   uid=anonymous: ~90
   SUFFIX: 20

Ignoring 'cn=config' (schema-compat-ignore-subtree: cn=config) is an INVALID workaround because compat plugin expects it can update itself from the config changes

Next step: Test a global lock effective over all the backend locks. Only on update at a time on all backends

Promising results on the global lock test.
With the fix, I am no longer able to reproduce a hang in automember unit tests.
I also ran the full unit tests that were 100% successful.

The fix implements in DS a global lock for all database backends but also to the DSE frontend backend.

An additional signature of the hang is found in todays CI tests:

  • There is a config update like Thread 11 (https://fedorahosted.org/freeipa/ticket/4635#comment:1)
  • That deadlock automember task with Thread 2 (https://fedorahosted.org/freeipa/ticket/4635#comment:1)
  • In addition we can also see retrocl housekeeping Thread 34 being locked by pages hold by Thread 2

    58 dd=65 locks held 0 write locks 0 pid/thread 19827/140225984591616 flags 0 priority 100
    58 READ 1 WAIT changelog/changenumber.db page 1

    Thread 36 (Thread 0x7f88e7fff700 (LWP 19838)):

    0 0x00007f88fe3ccca0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

    1 0x00007f88f89261c3 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so

    2 0x00007f88f89255a8 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so

    3 0x00007f88f89cffda in __lock_get_internal () from /lib64/libdb-5.3.so

    4 0x00007f88f89d0ac0 in __lock_get () from /lib64/libdb-5.3.so

    5 0x00007f88f89fc6da in __db_lget () from /lib64/libdb-5.3.so

    6 0x00007f88f8942612 in __bam_get_root () from /lib64/libdb-5.3.so

    7 0x00007f88f89428cf in __bam_search () from /lib64/libdb-5.3.so

    8 0x00007f88f892e126 in __bamc_search () from /lib64/libdb-5.3.so

    9 0x00007f88f892fe74 in __bamc_get () from /lib64/libdb-5.3.so

    10 0x00007f88f89e9156 in __dbc_iget () from /lib64/libdb-5.3.so

    11 0x00007f88f89f80b4 in __dbc_get_pp () from /lib64/libdb-5.3.so

    12 0x00007f88f4d3b6cd in ldbm_back_seq () from /usr/lib64/dirsrv/plugins/libback-ldbm.so

    13 0x00007f890063d8b8 in seq_internal_callback_pb () from /usr/lib64/dirsrv/libslapd.so.0

    14 0x00007f890063da76 in slapi_seq_callback () from /usr/lib64/dirsrv/libslapd.so.0

    15 0x00007f88f359870e in retrocl_getchangetime () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so

    16 0x00007f88f359ac17 in retrocl_housekeeping () from /usr/lib64/dirsrv/plugins/libretrocl-plugin.so

    17 0x00007f89006021ea in eq_loop () from /usr/lib64/dirsrv/libslapd.so.0

    18 0x00007f88fea28e3b in _pt_root () from /lib64/libnspr4.so

    19 0x00007f88fe3c8ee5 in start_thread () from /lib64/libpthread.so.0

    20 0x00007f88fe0f7b8d in clone () from /lib64/libc.so.6

Note: the RC of the hang remains Thread 11 and Thread 2, but here thread 34 can be an additional signature of that hang

Deadlock reported with https://fedorahosted.org/freeipa/ticket/4635#comment:11 appears with 389-ds master branch with ipa-4-0 branch both from today's builds

So 389-ds did not contain the fix for ​https://fedorahosted.org/389/ticket/47936

I made a second prototype 389-ds of the global backend lock.
It is available under dnf copr enable tbordaz/F20buildDS389 (https://copr.fedoraproject.org/coprs/tbordaz/F20buildDS389/build/54660/)

Once the DS instance is created, it must be stopped and configure like:

dn: cn=global backend lock,cn=config
objectClass: top
objectClass: extensibleobject
cn: global backend lock
backend-type: ldbm database     <<<< this value to lock all database backend
backend-name: frontend-internal <<<< this value to lock 'cn=config'

Then the instance can be restarted.

The next step is to evaluate the performance impact of that fix. A good approach is to use an IPA deployment to benefit of all the plugins. Then run some QE tools to do stress tests.
When testing without the fix, the stress tests should not trigger deadlock and especially to configuration changes during the tests.

Thanks for investigation. As soon as the bug is fixed in 389-ds-base, we should bump Requires in IPA.

  • As mentioned in https://fedorahosted.org/389/ticket/47936#comment:5, an alternative to global backend lock was tested using slapi_back_transaction_begin/slapi_back_transaction_commit
  • slapi_back_transaction_begin/slapi_back_transaction_commit is not able to prevent the deadlock . This is the same deadlock stack than in ​https://fedorahosted.org/freeipa/ticket/4635#comment:1

    • The automember task modifies a static group to add a member. It holds in write some member.db page (under transaction). Starts a transaction (slapi_back_transaction_begin) and tries to acquire the schema compat map lock

      member: fqdn=web1.idm.lab.bos.redhat.com,cn=computers,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
      group : cn=hostgroup1,cn=hostgroups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com

    • A task entry is added (in cn=config) that acquires schema compat map lock (postop add). Before acquiring the map lock it started a transaction (slapi_back_transaction_begin) but on cn=config (not on bdb backend). Then it tries to acquire (read) a member.db page to do the following search

    Added entry: cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config
    Internal search
    base: "cn=groups,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com"
    backend "userRoot"
    scope: one level
    filter: member=cn=cf58f2cd-8015-48e8-88f9-014a58fa830c,cn=automember rebuild membership,cn=tasks,cn=config

    slapi_back_transaction_begin/slapi_back_transaction_commit can only prevent deadlock if all callers start transactions on bdb backends. As ADDing the task entry is done on a none bdb backend, slapi_back_transaction_* can not prevent deadlock
    
  • The internal search looks useless, as it makes no sense to look for a task entry into an accounts group. A possible fix in schema compat, would be to prevent those searches. Now it could be difficult to fitler it and it is not sure we can prevent ALL internal searches (triggered by schema compat) under the database.

  • The remaining solution is to use a global backend lock (see ​https://fedorahosted.org/freeipa/ticket/4635#comment:13). Need to do some tests on the performance side. I will prepare a review on this 389-ds patch

  • This new hangs are a side effect of: 86b5dce Ignore irrelevant subtrees in schema compat plugin
    this fix defines values for schema-compat-restrict-subtree, so it overwrite the default values
    that was 'cn=tasks,cn=config'

  • It exists two possible fixes that seem to prevent those hang
    I have been able to test them ONTOP of 86b5dce (using automember suite).
    I was not able to test them with the branch (4-0 or 4-1) because of an other problem
    I had to start pki-tomcat.

First fix is to restrict the scope of Schema Compat to the database and its own config:

dn: cn=computers,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-restrict-subtree: dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

dn: cn=groups,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-restrict-subtree: dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

dn: cn=ng,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-restrict-subtree: dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

dn: cn=sudoers,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-restrict-subtree: dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

dn: cn=users,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-restrict-subtree: dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
schema-compat-restrict-subtree: cn=Schema Compatibility,cn=plugins,cn=config

Second fix is to add

dn: cn=computers,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-ignore-subtree: cn=changelog
schema-compat-ignore-subtree: o=ipaca
schema-compat-ignore-subtree: cn=tasks,cn=config

dn: cn=groups,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-ignore-subtree: cn=changelog
schema-compat-ignore-subtree: o=ipaca
schema-compat-ignore-subtree: cn=tasks,cn=config

dn: cn=ng,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-ignore-subtree: cn=changelog
schema-compat-ignore-subtree: o=ipaca
schema-compat-ignore-subtree: cn=tasks,cn=config

dn: cn=sudoers,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-ignore-subtree: cn=changelog
schema-compat-ignore-subtree: o=ipaca
schema-compat-ignore-subtree: cn=tasks,cn=config

dn: cn=users,cn=Schema Compatibility,cn=plugins,cn=config
schema-compat-ignore-subtree: cn=changelog
schema-compat-ignore-subtree: o=ipaca
schema-compat-ignore-subtree: cn=tasks,cn=config
  • The previous fixes prevents Schema compat to do internal search (on main DB)
    during an update (add task) on 'cn=config' backend.
    If in other use case, it is not possible and Schema Compat needs to do internal searches (on main DB)
    then a global lock (covering BDB and others backend) will be required.
    Like the fix in https://fedorahosted.org/389/ticket/47936

  • I am still trying to do a full tests of these two fixes

Both fixes (ignore-subtree += cn=tasks,cn=config, restrict-subtree = <maindb>+<schema compat="" config="">) are now tested with make-test (was hitting https://fedorahosted.org/freeipa/ticket/4666 during the tests).

Preparing a patch.

Forgotten flag, patch was reviewed.

master:

  • 85eb175 Deadlock in schema compat plugin (between automember_update_membership task and dse update)

ipa-4-1:

  • f0bcf2b Deadlock in schema compat plugin (between automember_update_membership task and dse update)

This was fixed also in 4.0.5.

Metadata Update from @tbordaz:
- Issue assigned to tbordaz
- Issue set to the milestone: FreeIPA 4.0.5

2 years ago

Login to comment on this ticket.

Metadata