#25 ns-slapd hangs in db functions under load
Closed: wontfix None Opened 12 years ago by mkosek.

https://bugzilla.redhat.com/show_bug.cgi?id=742582

The FreeIPA development team reported an issue with ns-slapd hanging when under
load from IPA. (https://fedorahosted.org/freeipa/ticket/1885).  I am able to
reproduce this issue on F15 x86_64.

The hang looks to be some sort of deadlock within libdb.  When the hang occurs,
the ns-slapd process goes to 100% CPU and doesn't progress.  I built a debug
version of 389-ds-base and db4, and was able to get some interesting stack
traces from 3 threads that seem to contribute to this problem.  There is one
thread doing a write to the memberOf index db, another thread doing a read from
the memberOf index db, and the checkpoint thread attempting to do a checkpoint.
These threads all seem to be yielding or waiting for a mutex.

The db_stat tool doesn't show anything waiting for locks, so it seems to be
something more internal to libdb.

batch move to milestone 1.3

I attempted to reproduce this again today on F16 using the 389-ds-base-1.2.10.rc1 bits, as I was having trouble reproducing this the last time I looked at it. I am still able to reproduce the hang with the test scripts that are attached to the original bug, but the hang does not occur every time.

The current test script has a loop that performs 3000 membership iterations. I would recommend increasing this to 5000+, and it may finish successfully sometimes since the issue is timing related.

results of the reproducer dshang

The test was run on Fedora 15 x86_64 (dual core).

The attachment dshang_result.tar.gz contains:
with_db4-4.8.30/db_stat.dshang
/typescript.dshang
/dshang.out
with_db5-5.1.25-3/db_stat.dshang
/typescript.dshang
/dshang.out

The bad news is the deadlock is observed both with db4 and with db5.

Another observation is if I run the same test dshang on Fecora 16 x86_64 (dual core), the hang does not occur both with db4 and with db5.

I ran the dshang test script on RHEL6.3:
Red Hat Enterprise Linux Server release 6.3 Beta (Santiago)
# lscpu
Architecture: x86_64
CPU(s): 8

The test successfully ran for 36 hours.

Note: In this test, DS is linked to db4-4.7.25-16.el6.x86_64; there is no db5 (libdb) available on RHEL6.

yum install libdb

No package libdb available.

Since the symptom is observed only on Fedora15 and no other platforms, we are closing ticket for now.

Added initial screened field value.

Metadata Update from @nhosoi:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.2.11.rc1

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/25

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Invalid)

3 years ago

Login to comment on this ticket.

Metadata