#48233 Server crashes in ACL_LasFindFlush during shutdown if ACIs contain IP addresss restrictions
Closed: Fixed None Opened 4 years ago by pj101.

389ds compiled from git for the current branch 1.3.3 (so it's somewhere between 1.3.3.12 and future 1.3.3.13) crashes when shutting down. The crash happens only if some modifications were made to the server since its start. No replication. Reproducible - each time.

OS - CentOS7 x86_64 with all the latest patches.

Jul 28 21:09:05 ldap-edev systemd: Stopping 389 Directory Server edev....
Jul 28 21:09:05 ldap-edev kernel: ns-slapd[10215]: segfault at 40 ip 00007ff7f8fbac14 sp 00007fff3b6ff130 error 4 in libns-dshttpd.so.0.0.0[7ff7f8f98000+45000]
Jul 28 21:09:05 ldap-edev systemd: dirsrv@edev.service: main process exited, code=dumped, status=11/SEGV
Jul 28 21:09:05 ldap-edev systemd: Unit dirsrv@edev.service entered failed state.

Typical error messages in system logs (one per server restart or stop):

Jul 28 14:22:43 ldap-edev kernel: ns-slapd[5570]: segfault at 40 ip 00007f5027956b13 sp 00007ffe8fec7fb0 error 4 in libns-dshttpd.so.0.0.0[7f5027938000+3d000]
Jul 28 19:39:32 ldap-edev kernel: ns-slapd[5932]: segfault at 40 ip 00007f6c9714fb13 sp 00007ffc73cf2c00 error 4 in libns-dshttpd.so.0.0.0[7f6c97131000+3d000]
Jul 28 19:50:10 ldap-edev kernel: ns-slapd[7100]: segfault at 40 ip 00007fdce6975b13 sp 00007ffc78505400 error 4 in libns-dshttpd.so.0.0.0[7fdce6957000+3d000]
Jul 28 19:52:18 ldap-edev kernel: ns-slapd[7231]: segfault at 40 ip 00007fb2dae4cb13 sp 00007ffc1948ec20 error 4 in libns-dshttpd.so.0.0.0 (deleted)[7fb2dae2e000+3d000]
Jul 28 20:42:01 ldap-edev kernel: ns-slapd[8283]: segfault at 40 ip 00007f68f372ec14 sp 00007ffc1daa46d0 error 4 in libns-dshttpd.so.0.0.0[7f68f370c000+45000]
Jul 28 20:45:56 ldap-edev kernel: ns-slapd[8805]: segfault at 40 ip 00007fad8371ec14 sp 00007fff6914abe0 error 4 in libns-dshttpd.so.0.0.0[7fad836fc000+45000]
Jul 28 20:49:05 ldap-edev kernel: ns-slapd[8921]: segfault at 40 ip 00007f841dd22c14 sp 00007ffc14097150 error 4 in libns-dshttpd.so.0.0.0[7f841dd00000+45000]
Jul 28 20:51:16 ldap-edev kernel: ns-slapd[9003]: segfault at 40 ip 00007f1036075c14 sp 00007ffe27cef430 error 4 in libns-dshttpd.so.0.0.0[7f1036053000+45000]

The debug build shows that it happens in the file {{{ lib/libaccess/register.cpp:276 }}}, the corresponding line is:
{{{ *flush_funcp = (LASFlushFunc_t)PR_HashTableLookup(ACLLasFlushHash, attr_name); }}}

The gdb output is attached.

The previous server branch (in particular, 1.3.2.27) does not crash after importing the same ldif.


Another observation: if i delete all the ACIs containing expressions like "ip=..." or "ip!=...", the problem disappears. So it should be somehow related to the ACIs with ip adresses inside.

After some more investigation looks like it's the ACIs with ip limitations that are applied at suffix level cause the problem. If this sort of ACLs are present at lower levels, i cannot reproduce the problem.

The problem is always with one of the ACIs added at suffix level, the initialization ACL plugin log for this type of rules looks something like (container 1 looks like suffix ACL container):
{{{
NSACLPlugin - Added the ACL: "Enable anonymous read access" to existing container:[1]dc=id,dc=polytechnique,dc=edu
}}}

Well, after more tests, even for ACLs on sub-suffix level it still crashes with the same stack trace but less frequently. Maybe it happens only when there are any modifications of the entries concerned by these ACLs.

In fact, no modifications of entries are necessary to reproduce the bug. One needs to make an ldapsearch that concerns the ACI(s) containing ip limitations - the server will crash at shutdown. If the ACIs hit during the entry search since the start of the server do not contain {{{ ip }}} keyword, the server will not crash at shutdown.

Thanks for the investigation!

Would it be possible for you to run the server with valgrind? This sounds like a memory corruption that should show up in valgrind.

Run the server in valgrind like this http://www.port389.org/docs/389ds/FAQ/faq.html#memory-growth ?
Ok, i'll give it a try.

Ok, i've made a valgrind ran. I'm not sure it helps any more - see attachment, it's the same stack trace.

Replying to [comment:11 pj101]:

Ok, i've made a valgrind ran. I'm not sure it helps any more - see attachment, it's the same stack trace.

There should be a valgrind output file (named something like /var/tmp/val/slapd.vg.<pid>). This would show some details about any memory corruption. If this file exists, would you please provide it as an attachment?

It is in the attachment (slapd.vg.22728). Here is another attachment - slapd.vg.22813 - with correctly started and shut down server (i.e. without ldapsearch in the middle)

This valgrind log shows the cause of the crash... There is a bug which mistakenly assigns 0x40 to some address (instead of an integer?)

slapd.vg.22728‚Äč (4.6 KB) - added by pj101 13 minutes ago.
Valgrind log

It matches the crash logs, too...
Jul 28 14:22:43 ldap-edev kernel: ns-slapd[5570]: segfault at 40 ip 00007f5027956b13 sp 00007ffe8fec7fb0 error 4 in libns-dshttpd.so.0.0.0[7f5027938000+3d000]

I've just tested with the 389ds supplied in rpm by CentOS/RHEL 7.1 (389-ds-base-1.3.3.1-16.el7_1.x86_64).

With "typical" installation by setup-ds.pl after adding a user and changing the first ACL to
aci: (targetattr!="userPassword")(version 3.0; acl "Enable anonymous access"; allow (read, search, compare) (userdn="ldap:///anyone") and (ip="127.0.0.1");)

i am able to reproduce the crash.

ldif file to import to reproduce the crash
reproduce-crash-ACIs-ip.ldif

So here are the instructions on how to reproduce the crash (the files {{{ reproduce-setup.inf }}} and {{{ reproduce-crash-ACIs-ip.ldif }}} attached to the ticket should be in the current folder):

{{{
sed -e s/ldap-model.polytechnique.fr/hostname/ reproduce-setup.inf
setup-ds.pl --silent --file=reproduce-setup.inf
systemctl stop dirsrv.target
sleep 5
ldif2db -n userRoot -i pwd/reproduce-crash-ACIs-ip.ldif
systemctl start dirsrv.target
sleep 5
ldapsearch -x -h localhost -b "dc=polytechnique,dc=fr" uid=user1
systemctl stop dirsrv.target

}}}

Your fix looks good to me.

It'd be nice to repeat the test case test_ticket48233 with valgrind_enabled and check there is no Invalid reported?

Replying to [comment:19 nhosoi]:

Your fix looks good to me.

It'd be nice to repeat the test case test_ticket48233 with valgrind_enabled and check there is no Invalid reported?

Valgrind passed (no invalid reads/write)

0c4eafb..22d315b master -> master
commit 22d315b
Author: Mark Reynolds mreynolds@redhat.com
Date: Mon Aug 17 14:51:17 2015 -0400

48e506d..57c5d35 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 57c5d35

Metadata Update from @nhosoi:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.4.4

3 years ago

Login to comment on this ticket.

Metadata