#48149 ns-slapd double free or corruption crash
Closed: Fixed None Opened 4 years ago by lkrispen.

ds crashes occasionally hen performing a cn=monitor search.

his bug was reported in bz1203338, details there.

The core issue is in libdb, see: bz 1211871, but it could eventually fixed in DS by prebvening db_open calls and memp_stat calls to run in parallel


I wrote a lib389 test and got the following crash: Core was generated by `./ns-slapd -D /root/389TEST/install/etc/dirsrv/slapd-standalone -i /root/389TES'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f9bc6bb6887 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install audit-libs-2.4-1.fc20.x86_64 cyrus-sasl-gssapi-2.1.26-14.fc20.x86_64 cyrus-sasl-lib-2.1.26-14.fc20.x86_64 cyrus-sasl-md5-2.1.26-14.fc20.x86_64 glibc-2.18-14.fc20.x86_64 keyutils-libs-1.5.9-1.fc20.x86_64 krb5-libs-1.11.5-11.fc20.x86_64 libcom_err-1.42.8-3.fc20.x86_64 libdb-5.3.28-1.fc20.x86_64 libgcc-4.8.3-7.fc20.x86_64 libicu-50.1.2-11.fc20.x86_64 libselinux-2.2.1-6.fc20.x86_64 libstdc++-4.8.3-7.fc20.x86_64 nspr-4.10.7-1.fc20.x86_64 nss-3.17.0-1.fc20.x86_64 nss-softokn-3.17.0-1.fc20.x86_64 nss-softokn-freebl-3.17.0-1.fc20.x86_64 nss-util-3.17.0-1.fc20.x86_64 openssl-libs-1.0.1e-39.fc20.x86_64 pam-1.1.8-1.fc20.x86_64 pcre-8.33-6.fc20.x86_64 sqlite-3.8.6-2.fc20.x86_64 svrcore-4.0.4-10.fc20.x86_64 xz-libs-5.1.2-12alpha.fc20.x86_64 zlib-1.2.8-3.fc20.x86_64 (gdb) bt #0 0x00007f9bc6bb6887 in raise () from /lib64/libc.so.6 #1 0x00007f9bc6bb7f78 in abort () from /lib64/libc.so.6 #2 0x00007f9bc6bf6ad4 in __libc_message () from /lib64/libc.so.6 #3 0x00007f9bc6bfddf8 in _int_free () from /lib64/libc.so.6 #4 0x00007f9bc9168eb6 in slapi_ch_free (ptr=ptr@entry=0x7f9bb27f2ac0) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/ch_malloc.c:363 #5 0x00007f9bbf305c3c in ldbm_back_monitor_instance_search (pb=<optimized out="">, e=0x7f9b9400fe70, entryAfter=<optimized out="">, returncode=0x7f9bb27f4c84, returntext=<optimized out="">, arg=0x1272880) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/monitor.c:260 #6 0x00007f9bc917329b in dse_call_callback (pb=pb@entry=0x7f9bb27fbae0, operation=operation@entry=4, flags=1, entryBefore=entryBefore@entry=0x7f9b9400fe70, entryAfter=entryAfter@entry=0x0, returncode=returncode@entry=0x7f9bb27f4c84, returntext=returntext@entry=0x7f9bb27f4f00 "", pdse=<optimized out="">) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/dse.c:2663 #7 0x00007f9bc9174c7d in do_dse_search (attrsonly=<optimized out="">, attrs=<optimized out="">, filter=<optimized out="">, basedn=<optimized out="">, scope=<optimized out="">, pb=0x7f9bb27fbae0, pdse=0xeeb5b0) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/dse.c:1675 #8 dse_search (pb=0x7f9bb27fbae0) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/dse.c:1789 #9 0x00007f9bc91aafa9 in op_shared_search (pb=pb@entry=0x7f9bb27fbae0, send_result=send_result@entry=1) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/opshared.c:823 #10 0x00000000004283cc in do_search (pb=pb@entry=0x7f9bb27fbae0) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/search.c:378 #11 0x000000000041860e in connection_dispatch_operation (pb=0x7f9bb27fbae0, op=0x15aebe0, conn=0x7f9bc9578560) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/connection.c:684 #12 connection_threadmain () at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/connection.c:2534 #13 0x00007f9bc75a6e3b in _pt_root () from /lib64/libnspr4.so #14 0x00007f9bc6f46f35 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f9bc6c75c3d in clone () from /lib64/libc.so.6 (gdb) f 5 #5 0x00007f9bbf305c3c in ldbm_back_monitor_instance_search (pb=<optimized out="">, e=0x7f9b9400fe70, entryAfter=<optimized out="">, returncode=0x7f9bb27f4c84, returntext=<optimized out="">, arg=0x1272880) at /root/389TEST/workspaces/389-ds-base/ds/ldap/servers/slapd/back-ldbm/monitor.c:260 260 slapi_ch_free((void **)&mpfstat); This looks very close to the customer crash

with the attached test script I did get a crash in 5 out of 10 runs
I produced 10 crahses with 7 different stack traces, but all in malloc and related to ldbm_back_monitor_instance search. That we see different crash location is quite usual for heap corruptions

{{{
105 #define DB_OPEN(priv, oflags, db, txnid, file, database, type, flags, mode, rval) \
...
109 if ((priv)) slapi_rwlock_rdlock((priv)->dblayer_env_lock); \
109 110 (rval) = ((db)->open)((db), (txnid), (file), (database), (type), (flags)|DB_AUTO_COMMIT, (mode)); \
111 if ((priv)) slapi_rwlock_unlock((priv)->dblayer_env_lock); \
}}}
Should this be "env" instead of "priv"?

well it is of type 'struct dblayer_private_env *', so I called it priv, but maybe it could be penv

Replying to [comment:9 lkrispen]:

well it is of type 'struct dblayer_private_env *', so I called it priv, but maybe it could be penv

Ok. I was just confused because everywhere DB_OPEN is used, the first argument is pENV or mypENV, and here the argument is env instead of priv:
{{{
125 #define DB_OPEN(env, oflags, db, txnid, file, database, type, flags, mode, rval) \
}}}

If it should be priv for the first definition of DB_OPEN, that's fine.

This issue is taken care in 1.2.11.
This is not a problem in 1.3.3 and newer since libdb has the fix.

Closing this ticket. Thanks, Ludwig!

Metadata Update from @lkrispen:
- Issue assigned to lkrispen
- Issue set to the milestone: 1.2.11.33

2 years ago

Login to comment on this ticket.

Metadata