Issue #49380: Creating new replication agreement crashes the server - 389-ds-base

389-ds-base

#49380 Creating new replication agreement crashes the server

Closed: wontfix 6 years ago Opened 6 years ago by misko.

Issue Description

When creating new replication agreement, with nsds5ReplicaEnabled: True, the server crashes.

Package Version and Platform

Fedora 27

389-ds-console-1.2.16-3.fc27.noarch
389-ds-1.2.2-10.fc27.noarch
389-adminutil-1.1.23-4.fc27.x86_64
389-console-1.1.18-3.fc27.noarch
389-ds-base-1.3.7.4-1.fc28.x86_64
389-admin-console-1.1.12-3.fc27.noarch
389-ds-console-doc-1.2.16-3.fc27.noarch
389-dsgw-1.1.11-13.fc27.x86_64
389-ds-base-libs-1.3.7.4-1.fc28.x86_64
389-admin-1.1.46-1.fc27.3.x86_64
389-admin-console-doc-1.1.12-3.fc27.noarch

Steps to reproduce

install two servers, default install; for later reference, my hostnames are f-ldap03.sandbox.in.pan-net.eu, and f-ldap04.sandbox.in.pan-net.eu
create replication DN on both servers (cn=test,cn=config), password test
create new rw replica (cn=replica)
start the replication agreement on first server (f-ldap03.sandbox.in.pan-net.eu) with this ldif:

dn: cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config
objectClass: nsds5replicationagreement
objectClass: top
nsds5replicahost: f-ldap04.sandbox.in.pan-net.eu
nsds5replicaport: 389
nsds5ReplicaBindDN: cn=test,cn=config
nsds5replicabindmethod: SIMPLE
nsds5replicaroot: dc=example,dc=com
description: test
nsds5replicaupdateschedule: 0001-2359 0123456
nsds5replicatedattributelist: (objectclass=*) $ EXCLUDE authorityRevocationList
nsds5replicacredentials: test
nsds5BeginReplicaRefresh: start
nsds5ReplicaEnabled: True

Please note nsds5ReplicaEnabled contains invalid value True instead of on

Actual results

[root@f-ldap03 fedora]# ldapadd -D cn=root -wadmin -h localhost -f crash.ldif
adding new entry "cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config"
ldap_result: Can't contact LDAP server (-1)

(actually, the daemon crashed)

● dirsrv@f-ldap03.service - 389 Directory Server f-ldap03.
   Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2017-09-13 12:04:56 CEST; 5min ago
  Process: 23738 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-f-ldap03 -i /var/run/dirsrv/slapd-f-ldap03.pid (code=killed, signal=SEGV)
  Process: 23733 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl /etc/dirsrv/slapd-f-ldap03/dse.ldif (code=exited, status=0/SUCCESS)
 Main PID: 23738 (code=killed, signal=SEGV)
   Status: "slapd started: Ready to process requests"

Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.862396510 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot entry cache (2 total): 65536k
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.885522798 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot dn cache (2 total): 65536k
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.913912748 +0200] - NOTICE - ldbm_back_start - total cache size: 289193410 B;
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.939287864 +0200] - NOTICE - dblayer_start - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:52.530646439 +0200] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu systemd[1]: Started 389 Directory Server f-ldap03..
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:56.107134621 +0200] - ERR - NSMMReplicationPlugin - agmt_new_from_entry - Warning invalid value for nsds5ReplicaEnabled (True), value must be "on" or "off".  Ignoring this repl agreement.
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Main process exited, code=killed, status=11/SEGV
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Unit entered failed state.
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Failed with result 'signal'.

On Debian, the log is even more funny:

Sep  1 16:13:39 ldap01 ns-slapd[12032]: *** Error in `/usr/sbin/ns-slapd': malloc(): memory corruption (fast): 0x00007f473c020e3f ***
Sep  1 16:13:39 ldap01 ns-slapd[12032]: ======= Backtrace: =========
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f47714c108b]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x85008)[0x7f47714cd008]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f47714ce984]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__strdup+0x1a)[0x7f47714d579a]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_ch_strdup+0x13)[0x7f4772ccca33]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_sdn_set_dn_byval+0x2d)[0x7f4772cd5add]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/sbin/ns-slapd(+0x1a2ae)[0x5633607092ae]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/libnspr4.so(+0x27ed9)[0x7f4771c6fed9]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da)[0x7f47718166da]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f)[0x7f4771550d7f]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: ======= Memory map: ========

Expected results

Not crashing and creating the agreement

misko commented 6 years ago

I can provide Ansible playbooks for replicating the problem if necessary.

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

6 years ago

mreynolds commented 6 years ago

The crashing stack

Thread 44 "ns-slapd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb659024700 (LWP 22781)]
0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588
588     if (sch->pending_event) {
(gdb) p *sch
Cannot access memory at address 0x0
(gdb) where
#0  0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588
#1  0x00007fb68af10b7f in schedule_destroy (s=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:134
#2  0x00007fb68aee00d6 in agmt_delete (rap=0x7fb659023168) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:634
#3  0x00007fb68aedfddb in agmt_new_from_entry (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:537
#4  0x00007fb68aee63b4 in add_new_agreement (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:136
#5  0x00007fb68aee673c in agmtlist_add_callback (pb=0x223ff60, e=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "", arg=0x0)
    at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:232
#6  0x00007fb697a4f4e8 in dse_call_callback (pdse=0x1e86ff0, pb=0x223ff60, operation=16, flags=1, entryBefore=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "") at ../389-ds-base/ldap/servers/slapd/dse.c:2523
#7  0x00007fb697a4e7f1 in dse_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/dse.c:2220
#8  0x00007fb697a3140f in op_shared_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:671
#9  0x00007fb697a303fd in do_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:228
#10 0x0000000000416a9c in connection_dispatch_operation (conn=0x47c9d70, op=0x21e0000, pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/connection.c:609
#11 0x0000000000418a3a in connection_threadmain () at ../389-ds-base/ldap/servers/slapd/connection.c:1761
#12 0x00007fb6957c470c in _pt_root () at /lib64/libnspr4.so
#13 0x00007fb69516173a in start_thread () at /lib64/libpthread.so.0
#14 0x00007fb694c52e7f in clone () at /lib64/libc.so.6

Edited 6 years ago by mreynolds

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

mreynolds commented 6 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: None)

6 years ago

tbordaz commented 6 years ago

The fix looks good.
schedule_set is called with ra->schedule. If it happens that ra->schedule can be NULL we may also want to test its value in schedule_set.

Edited 6 years ago by tbordaz

mreynolds commented 6 years ago

The fix looks good.
schedule_set is called with ra->schedule. If it happens that ra->schedule can be NULL we may also want to test its value in schedule_set.

I don't think it can be NULL when we call schedule_set(), but I put the check in anyway. Revised patch:

tbordaz commented 6 years ago

Thanks Mark, the patch looks good to me. ACK

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

mreynolds commented 6 years ago

fe1cfca..610db47 master -> master

5a2b673..3535afb 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1491706
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

mreynolds commented 6 years ago

Add CI test

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)

6 years ago

spichugi commented 6 years ago

LGTM,
though, could you please add a proper docstring?

Like this:

"""Test checks that an invalid agreement is properly rejected
and does not crash the server

:id: 6c3b2a7e-edcd-4327-a003-6bd878ff722b
:setup: MMR with four masters
:steps:
    1. Add invalid agreement (nsds5ReplicaEnabled set to invalid value)
    2. Verify the server is still running
:expectedresults:
    1. Invalid repl agreement should be rejected
    2. Server should be still running
"""

mreynolds commented 6 years ago

Newly revised patch: