https://bugzilla.redhat.com/show_bug.cgi?id=829432 (Red Hat Enterprise Linux 6)
Description of problem: slapd crashes while running subtree renames stress tests. I observed this crash after successful modrdn operations and ldapsearch command. Version-Release number of selected component (if applicable): 389-ds-base-1.2.10.2 How reproducible: Consistently. Steps to Reproduce: 1. Install 389-ds-base 1.2.10.2 or latest. 2. Enable debug repos and run "debuginfo-install 389-ds-base" to install debuginfo packages to analyze the core.PID files. Refer - http://port389.org/wiki/FAQ#Debugging_Crashes 3. From TET RHEL63 branch, run the stress tests for subtree renames. Choose only stress_03_02 test. Remove the cleanup(ic9) test from iclist. 4. It takes about 2 to 3 hrs to complete the stress test. In the end, you will see the test PASS. 5. Check whether slapd instance is running. If slapd is not running, then the server crashed. 6. Check for the core file under /var/log/dirsrv/slapd-$inst/core.$PID. Actual results: Slapd crashed and core files generated. Expected results: Slapd shouldn't crash. Additional info: Stress test report - http://hp-z600-01.rhts.eng.bos.redhat.com/qa/archive/ds/90/stress_ds90/Linux/20 120605-104757.html
set default ticket origin to Community
Added initial screened field value.
I followed the steps. My server is 1.3.0.a1 (local build) on F17. The server did not crash, but the error log logged these problems: [..] - libdb: BDB2055 Lock table is out of available lock entries [..] entryrdn-index - _entryrdn_del_data: Deleting P92357 failed; Cannot allocate memory(12)
BDB version: libdb-5.2.36-5.fc17.x86_64
The default dblayer_lock_config value is 10000. dblayer.c: pEnv->set_lk_max_locks(pEnv, priv->dblayer_lock_config); dblayer.c: pEnv->set_lk_max_objects(pEnv, priv->dblayer_lock_config); dblayer.c: pEnv->set_lk_max_lockers(pEnv, priv->dblayer_lock_config);
Running the test with dblayer_lock_config value 40000...
Replying to [comment:5 nhosoi]:
I followed the steps. My server is 1.3.0.a1 (local build) on F17. The server did not crash, but the error log logged these problems: [..] - libdb: BDB2055 Lock table is out of available lock entries [..] entryrdn-index - _entryrdn_del_data: Deleting P92357 failed; Cannot allocate memory(12) BDB version: libdb-5.2.36-5.fc17.x86_64 The default dblayer_lock_config value is 10000. dblayer.c: pEnv->set_lk_max_locks(pEnv, priv->dblayer_lock_config); dblayer.c: pEnv->set_lk_max_objects(pEnv, priv->dblayer_lock_config); dblayer.c: pEnv->set_lk_max_lockers(pEnv, priv->dblayer_lock_config); Running the test with dblayer_lock_config value 40000...
We should check 1.2.11.
1.2.11.16 finished just fine (w/o the lock table errors) on FC17.
[..] - slapd started. Listening on All Interfaces port 10389 for LDAP requests [..] - ldbm: Bringing stress_03_02_DBusr offline... [..] - ldbm: removing 'stress_03_02_DBusr'. [..] - Destructor for instance stress_03_02_DBusr called
The server is still up and running.
I could reproduce the problem! ==12483== Thread 23: ==12483== Invalid read of size 8 ==12483== at 0xA5B5650: entryrdn_lookup_dn (ldbm_entryrdn.c:1208) ==12483== by 0xA5890D2: id2entry (id2entry.c:378) ==12483== by 0xA5C5368: moddn_get_children (ldbm_modrdn.c:1992) ==12483== by 0xA5C24CB: ldbm_back_modrdn (ldbm_modrdn.c:814) ==12483== by 0x4CA0DA0: op_shared_rename (modrdn.c:664) ==12483== by 0x4CA00A2: do_modrdn (modrdn.c:268) ==12483== by 0x414833: connection_dispatch_operation (connection.c:588) ==12483== by 0x41617D: connection_threadmain (connection.c:2353) ==12483== by 0x57D6C72: ??? (in /usr/lib64/libnspr4.so) ==12483== by 0x34FE407D13: start_thread (in /usr/lib64/libpthread-2.15.so) ==12483== by 0x34FE0F167C: clone (in /usr/lib64/libc-2.15.so) ==12483== Address 0xfe46c28 is 584 bytes inside a block of size 1,416 free'd ==12483== at 0x4A079AE: free (vg_replace_malloc.c:427) ==12483== by 0x35024E8678: __db_close (in /usr/lib64/libdb-5.2.so) ==12483== by 0x35024F8E7C: __db_close_pp (in /usr/lib64/libdb-5.2.so) ==12483== by 0xA57B023: dblayer_close_file (dblayer.c:3113) ==12483== by 0xA57B495: dblayer_erase_index_file_ex (dblayer.c:3343) ==12483== by 0xA57B70A: dblayer_erase_index_file (dblayer.c:3400) ==12483== by 0xA5BB732: ldbm_instance_index_config_delete_callback (ldbm_index_config.c:182) ==12483== by 0x4C69F75: dse_call_callback (dse.c:2394) ==12483== by 0x4C69ACF: dse_delete (dse.c:2291) ==12483== by 0x4C5F5A9: op_shared_delete (delete.c:364) ==12483== by 0x4C5EE72: do_delete (delete.c:128) ==12483== by 0x414811: connection_dispatch_operation (connection.c:583)
git patch file (master) 0001-Trac-Ticket-391-Slapd-crashes-when-deleting-backends.patch
Bug Description: Deleting backend code ldbm_instance_delete_instance_ entry_callback had no checking for the ordinary operations accessing the backend instance. Even if some operations are still in progress, the backend instance could be deleted and it crashes the server.
Fix Description: Backend struct ldbm_instance had a member inst_ref_ count, which was not used. This patch converts the type PRInt32 to Slapi_Counter and increments it when the backend instance is in use. The delete code checks the counter and if it is greater than 0, it returns SLAPI_DSE_CALLBACK_ERROR.
Reviewed by Rich (Thank you!!)
Pushed to master. {{{ $ git merge trac391 Updating caf2feb..7f81635 Fast-forward ldap/servers/slapd/back-ldbm/back-ldbm.h | 4 +-- ldap/servers/slapd/back-ldbm/dblayer.c | 22 ++++++++---- ldap/servers/slapd/back-ldbm/id2entry.c | 6 +++ ldap/servers/slapd/back-ldbm/import-merge.c | 4 +- ldap/servers/slapd/back-ldbm/instance.c | 4 ++ ldap/servers/slapd/back-ldbm/ldbm_add.c | 18 ++++++++-- ldap/servers/slapd/back-ldbm/ldbm_bind.c | 33 +++++++++++++---- ldap/servers/slapd/back-ldbm/ldbm_compare.c | 18 ++++++++-- ldap/servers/slapd/back-ldbm/ldbm_delete.c | 13 ++++++- ldap/servers/slapd/back-ldbm/ldbm_index_config.c | 8 ++++- .../servers/slapd/back-ldbm/ldbm_instance_config.c | 3 +- ldap/servers/slapd/back-ldbm/ldbm_modify.c | 11 ++++++ ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 12 ++++++ ldap/servers/slapd/back-ldbm/ldbm_search.c | 39 ++++++++++++-------- ldap/servers/slapd/back-ldbm/ldif2ldbm.c | 3 +- ldap/servers/slapd/back-ldbm/misc.c | 10 ++++- ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 3 +- ldap/servers/slapd/back-ldbm/vlv.c | 2 +- 18 files changed, 163 insertions(+), 50 deletions(-) $ git push origin master Counting objects: 46, done. Delta compression using up to 4 threads. Compressing objects: 100% (24/24), done. Writing objects: 100% (24/24), 12.04 KiB, done. Total 24 (delta 21), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git caf2feb..7f81635 master -> master }}}
Metadata Update from @nhosoi: - Issue assigned to nhosoi - Issue set to the milestone: 1.3.0.rc1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/391
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.