#391 Slapd crashes when deleting backends while operations are still in progress
Closed: Fixed None Opened 8 years ago by rmeggins.

https://bugzilla.redhat.com/show_bug.cgi?id=829432 (Red Hat Enterprise Linux 6)

Description of problem: slapd crashes while running subtree renames stress
tests.
I observed this crash after successful modrdn operations and ldapsearch
command.


Version-Release number of selected component (if applicable):
389-ds-base-1.2.10.2


How reproducible: Consistently.


Steps to Reproduce:
1. Install 389-ds-base 1.2.10.2 or latest.
2. Enable debug repos and run "debuginfo-install 389-ds-base" to install
debuginfo packages to analyze the core.PID files.
Refer - http://port389.org/wiki/FAQ#Debugging_Crashes
3. From TET RHEL63 branch, run the stress tests for subtree renames. Choose
only stress_03_02 test. Remove the cleanup(ic9) test from iclist.
4. It takes about 2 to 3 hrs to complete the stress test. In the end, you will
see the test PASS.
5. Check whether slapd instance is running. If slapd is not running, then the
server crashed.
6. Check for the core file under /var/log/dirsrv/slapd-$inst/core.$PID.

Actual results: Slapd crashed and core files generated.

Expected results: Slapd shouldn't crash.


Additional info: Stress test report -
http://hp-z600-01.rhts.eng.bos.redhat.com/qa/archive/ds/90/stress_ds90/Linux/20
120605-104757.html

set default ticket origin to Community

Added initial screened field value.

I followed the steps. My server is 1.3.0.a1 (local build) on F17. The server did not crash, but the error log logged these problems:
[..] - libdb: BDB2055 Lock table is out of available lock entries
[..] entryrdn-index - _entryrdn_del_data: Deleting P92357 failed; Cannot allocate memory(12)

BDB version: libdb-5.2.36-5.fc17.x86_64

The default dblayer_lock_config value is 10000.
dblayer.c: pEnv->set_lk_max_locks(pEnv, priv->dblayer_lock_config);
dblayer.c: pEnv->set_lk_max_objects(pEnv, priv->dblayer_lock_config);
dblayer.c: pEnv->set_lk_max_lockers(pEnv, priv->dblayer_lock_config);

Running the test with dblayer_lock_config value 40000...

Replying to [comment:5 nhosoi]:

I followed the steps. My server is 1.3.0.a1 (local build) on F17. The server did not crash, but the error log logged these problems:
[..] - libdb: BDB2055 Lock table is out of available lock entries
[..] entryrdn-index - _entryrdn_del_data: Deleting P92357 failed; Cannot allocate memory(12)

BDB version: libdb-5.2.36-5.fc17.x86_64

The default dblayer_lock_config value is 10000.
dblayer.c: pEnv->set_lk_max_locks(pEnv, priv->dblayer_lock_config);
dblayer.c: pEnv->set_lk_max_objects(pEnv, priv->dblayer_lock_config);
dblayer.c: pEnv->set_lk_max_lockers(pEnv, priv->dblayer_lock_config);

Running the test with dblayer_lock_config value 40000...

We should check 1.2.11.

1.2.11.16 finished just fine (w/o the lock table errors) on FC17.

[..] - slapd started. Listening on All Interfaces port 10389 for LDAP requests
[..] - ldbm: Bringing stress_03_02_DBusr offline...
[..] - ldbm: removing 'stress_03_02_DBusr'.
[..] - Destructor for instance stress_03_02_DBusr called

The server is still up and running.

I could reproduce the problem!
==12483== Thread 23:
==12483== Invalid read of size 8
==12483== at 0xA5B5650: entryrdn_lookup_dn (ldbm_entryrdn.c:1208)
==12483== by 0xA5890D2: id2entry (id2entry.c:378)
==12483== by 0xA5C5368: moddn_get_children (ldbm_modrdn.c:1992)
==12483== by 0xA5C24CB: ldbm_back_modrdn (ldbm_modrdn.c:814)
==12483== by 0x4CA0DA0: op_shared_rename (modrdn.c:664)
==12483== by 0x4CA00A2: do_modrdn (modrdn.c:268)
==12483== by 0x414833: connection_dispatch_operation (connection.c:588)
==12483== by 0x41617D: connection_threadmain (connection.c:2353)
==12483== by 0x57D6C72: ??? (in /usr/lib64/libnspr4.so)
==12483== by 0x34FE407D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==12483== by 0x34FE0F167C: clone (in /usr/lib64/libc-2.15.so)
==12483== Address 0xfe46c28 is 584 bytes inside a block of size 1,416 free'd
==12483== at 0x4A079AE: free (vg_replace_malloc.c:427)
==12483== by 0x35024E8678: __db_close (in /usr/lib64/libdb-5.2.so)
==12483== by 0x35024F8E7C: __db_close_pp (in /usr/lib64/libdb-5.2.so)
==12483== by 0xA57B023: dblayer_close_file (dblayer.c:3113)
==12483== by 0xA57B495: dblayer_erase_index_file_ex (dblayer.c:3343)
==12483== by 0xA57B70A: dblayer_erase_index_file (dblayer.c:3400)
==12483== by 0xA5BB732: ldbm_instance_index_config_delete_callback (ldbm_index_config.c:182)
==12483== by 0x4C69F75: dse_call_callback (dse.c:2394)
==12483== by 0x4C69ACF: dse_delete (dse.c:2291)
==12483== by 0x4C5F5A9: op_shared_delete (delete.c:364)
==12483== by 0x4C5EE72: do_delete (delete.c:128)
==12483== by 0x414811: connection_dispatch_operation (connection.c:583)

Bug Description: Deleting backend code ldbm_instance_delete_instance_
entry_callback had no checking for the ordinary operations accessing
the backend instance. Even if some operations are still in progress,
the backend instance could be deleted and it crashes the server.

Fix Description: Backend struct ldbm_instance had a member inst_ref_
count, which was not used. This patch converts the type PRInt32 to
Slapi_Counter and increments it when the backend instance is in use.
The delete code checks the counter and if it is greater than 0, it
returns SLAPI_DSE_CALLBACK_ERROR.

Reviewed by Rich (Thank you!!)

Pushed to master.
{{{
$ git merge trac391
Updating caf2feb..7f81635
Fast-forward
ldap/servers/slapd/back-ldbm/back-ldbm.h | 4 +--
ldap/servers/slapd/back-ldbm/dblayer.c | 22 ++++++++----
ldap/servers/slapd/back-ldbm/id2entry.c | 6 +++
ldap/servers/slapd/back-ldbm/import-merge.c | 4 +-
ldap/servers/slapd/back-ldbm/instance.c | 4 ++
ldap/servers/slapd/back-ldbm/ldbm_add.c | 18 ++++++++--
ldap/servers/slapd/back-ldbm/ldbm_bind.c | 33 +++++++++++++----
ldap/servers/slapd/back-ldbm/ldbm_compare.c | 18 ++++++++--
ldap/servers/slapd/back-ldbm/ldbm_delete.c | 13 ++++++-
ldap/servers/slapd/back-ldbm/ldbm_index_config.c | 8 ++++-
.../servers/slapd/back-ldbm/ldbm_instance_config.c | 3 +-
ldap/servers/slapd/back-ldbm/ldbm_modify.c | 11 ++++++
ldap/servers/slapd/back-ldbm/ldbm_modrdn.c | 12 ++++++
ldap/servers/slapd/back-ldbm/ldbm_search.c | 39 ++++++++++++--------
ldap/servers/slapd/back-ldbm/ldif2ldbm.c | 3 +-
ldap/servers/slapd/back-ldbm/misc.c | 10 ++++-
ldap/servers/slapd/back-ldbm/proto-back-ldbm.h | 3 +-
ldap/servers/slapd/back-ldbm/vlv.c | 2 +-
18 files changed, 163 insertions(+), 50 deletions(-)
$ git push origin master
Counting objects: 46, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (24/24), done.
Writing objects: 100% (24/24), 12.04 KiB, done.
Total 24 (delta 21), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
caf2feb..7f81635 master -> master
}}}

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.3.0.rc1

3 years ago

Login to comment on this ticket.

Metadata