389ds compiled from git for the current branch 1.3.3 (so it's somewhere between 1.3.3.12 and future 1.3.3.13) crashes when shutting down. The crash happens only if some modifications were made to the server since its start. No replication. Reproducible - each time.
OS - CentOS7 x86_64 with all the latest patches.
Jul 28 21:09:05 ldap-edev systemd: Stopping 389 Directory Server edev.... Jul 28 21:09:05 ldap-edev kernel: ns-slapd[10215]: segfault at 40 ip 00007ff7f8fbac14 sp 00007fff3b6ff130 error 4 in libns-dshttpd.so.0.0.0[7ff7f8f98000+45000] Jul 28 21:09:05 ldap-edev systemd: dirsrv@edev.service: main process exited, code=dumped, status=11/SEGV Jul 28 21:09:05 ldap-edev systemd: Unit dirsrv@edev.service entered failed state.
Typical error messages in system logs (one per server restart or stop):
Jul 28 14:22:43 ldap-edev kernel: ns-slapd[5570]: segfault at 40 ip 00007f5027956b13 sp 00007ffe8fec7fb0 error 4 in libns-dshttpd.so.0.0.0[7f5027938000+3d000] Jul 28 19:39:32 ldap-edev kernel: ns-slapd[5932]: segfault at 40 ip 00007f6c9714fb13 sp 00007ffc73cf2c00 error 4 in libns-dshttpd.so.0.0.0[7f6c97131000+3d000] Jul 28 19:50:10 ldap-edev kernel: ns-slapd[7100]: segfault at 40 ip 00007fdce6975b13 sp 00007ffc78505400 error 4 in libns-dshttpd.so.0.0.0[7fdce6957000+3d000] Jul 28 19:52:18 ldap-edev kernel: ns-slapd[7231]: segfault at 40 ip 00007fb2dae4cb13 sp 00007ffc1948ec20 error 4 in libns-dshttpd.so.0.0.0 (deleted)[7fb2dae2e000+3d000] Jul 28 20:42:01 ldap-edev kernel: ns-slapd[8283]: segfault at 40 ip 00007f68f372ec14 sp 00007ffc1daa46d0 error 4 in libns-dshttpd.so.0.0.0[7f68f370c000+45000] Jul 28 20:45:56 ldap-edev kernel: ns-slapd[8805]: segfault at 40 ip 00007fad8371ec14 sp 00007fff6914abe0 error 4 in libns-dshttpd.so.0.0.0[7fad836fc000+45000] Jul 28 20:49:05 ldap-edev kernel: ns-slapd[8921]: segfault at 40 ip 00007f841dd22c14 sp 00007ffc14097150 error 4 in libns-dshttpd.so.0.0.0[7f841dd00000+45000] Jul 28 20:51:16 ldap-edev kernel: ns-slapd[9003]: segfault at 40 ip 00007f1036075c14 sp 00007ffe27cef430 error 4 in libns-dshttpd.so.0.0.0[7f1036053000+45000]
The debug build shows that it happens in the file {{{ lib/libaccess/register.cpp:276 }}}, the corresponding line is: {{{ *flush_funcp = (LASFlushFunc_t)PR_HashTableLookup(ACLLasFlushHash, attr_name); }}}
The gdb output is attached.
The previous server branch (in particular, 1.3.2.27) does not crash after importing the same ldif.
gdb dump analysis stacktrace.1438110619.txt
Another observation: if i delete all the ACIs containing expressions like "ip=..." or "ip!=...", the problem disappears. So it should be somehow related to the ACIs with ip adresses inside.
After some more investigation looks like it's the ACIs with ip limitations that are applied at suffix level cause the problem. If this sort of ACLs are present at lower levels, i cannot reproduce the problem.
The problem is always with one of the ACIs added at suffix level, the initialization ACL plugin log for this type of rules looks something like (container 1 looks like suffix ACL container): {{{ NSACLPlugin - Added the ACL: "Enable anonymous read access" to existing container:[1]dc=id,dc=polytechnique,dc=edu }}}
Well, after more tests, even for ACLs on sub-suffix level it still crashes with the same stack trace but less frequently. Maybe it happens only when there are any modifications of the entries concerned by these ACLs.
In fact, no modifications of entries are necessary to reproduce the bug. One needs to make an ldapsearch that concerns the ACI(s) containing ip limitations - the server will crash at shutdown. If the ACIs hit during the entry search since the start of the server do not contain {{{ ip }}} keyword, the server will not crash at shutdown.
Thanks for the investigation!
Would it be possible for you to run the server with valgrind? This sounds like a memory corruption that should show up in valgrind.
Run the server in valgrind like this http://www.port389.org/docs/389ds/FAQ/faq.html#memory-growth ? Ok, i'll give it a try.
Ok, i've made a valgrind ran. I'm not sure it helps any more - see attachment, it's the same stack trace.
Valgrind log slapd.vg.22728
Replying to [comment:11 pj101]:
There should be a valgrind output file (named something like /var/tmp/val/slapd.vg.<pid>). This would show some details about any memory corruption. If this file exists, would you please provide it as an attachment?
It is in the attachment (slapd.vg.22728). Here is another attachment - slapd.vg.22813 - with correctly started and shut down server (i.e. without ldapsearch in the middle)
attachment slapd.vg.22813
This valgrind log shows the cause of the crash... There is a bug which mistakenly assigns 0x40 to some address (instead of an integer?)
slapd.vg.22728 (4.6 KB) - added by pj101 13 minutes ago. Valgrind log
It matches the crash logs, too... Jul 28 14:22:43 ldap-edev kernel: ns-slapd[5570]: segfault at 40 ip 00007f5027956b13 sp 00007ffe8fec7fb0 error 4 in libns-dshttpd.so.0.0.0[7f5027938000+3d000]
I've just tested with the 389ds supplied in rpm by CentOS/RHEL 7.1 (389-ds-base-1.3.3.1-16.el7_1.x86_64).
With "typical" installation by setup-ds.pl after adding a user and changing the first ACL to aci: (targetattr!="userPassword")(version 3.0; acl "Enable anonymous access"; allow (read, search, compare) (userdn="ldap:///anyone") and (ip="127.0.0.1");)
i am able to reproduce the crash.
inf file for setup-ds.pl reproduce-setup.inf
ldif file to import to reproduce the crash reproduce-crash-ACIs-ip.ldif
So here are the instructions on how to reproduce the crash (the files {{{ reproduce-setup.inf }}} and {{{ reproduce-crash-ACIs-ip.ldif }}} attached to the ticket should be in the current folder):
{{{ sed -e s/ldap-model.polytechnique.fr/hostname/ reproduce-setup.inf setup-ds.pl --silent --file=reproduce-setup.inf systemctl stop dirsrv.target sleep 5 ldif2db -n userRoot -i pwd/reproduce-crash-ACIs-ip.ldif systemctl start dirsrv.target sleep 5 ldapsearch -x -h localhost -b "dc=polytechnique,dc=fr" uid=user1 systemctl stop dirsrv.target
hostname
pwd
}}}
attachment 0001-Ticket-48233-Server-crashes-in-ACL_LasFindFlush-duri.patch
Your fix looks good to me.
It'd be nice to repeat the test case test_ticket48233 with valgrind_enabled and check there is no Invalid reported?
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1254344
Replying to [comment:19 nhosoi]:
Your fix looks good to me. It'd be nice to repeat the test case test_ticket48233 with valgrind_enabled and check there is no Invalid reported?
Valgrind passed (no invalid reads/write)
0c4eafb..22d315b master -> master commit 22d315b Author: Mark Reynolds mreynolds@redhat.com Date: Mon Aug 17 14:51:17 2015 -0400
48e506d..57c5d35 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 57c5d35
Metadata Update from @nhosoi: - Issue assigned to mreynolds - Issue set to the milestone: 1.3.4.4
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1564
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.