#49380 Creating new replication agreement crashes the server
Closed: wontfix 6 years ago Opened 6 years ago by misko.

Issue Description

When creating new replication agreement, with nsds5ReplicaEnabled: True, the server crashes.

Package Version and Platform

Fedora 27

389-ds-console-1.2.16-3.fc27.noarch
389-ds-1.2.2-10.fc27.noarch
389-adminutil-1.1.23-4.fc27.x86_64
389-console-1.1.18-3.fc27.noarch
389-ds-base-1.3.7.4-1.fc28.x86_64
389-admin-console-1.1.12-3.fc27.noarch
389-ds-console-doc-1.2.16-3.fc27.noarch
389-dsgw-1.1.11-13.fc27.x86_64
389-ds-base-libs-1.3.7.4-1.fc28.x86_64
389-admin-1.1.46-1.fc27.3.x86_64
389-admin-console-doc-1.1.12-3.fc27.noarch

Steps to reproduce

  1. install two servers, default install; for later reference, my hostnames are f-ldap03.sandbox.in.pan-net.eu, and f-ldap04.sandbox.in.pan-net.eu
  2. create replication DN on both servers (cn=test,cn=config), password test
  3. create new rw replica (cn=replica)
  4. start the replication agreement on first server (f-ldap03.sandbox.in.pan-net.eu) with this ldif:
dn: cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config
objectClass: nsds5replicationagreement
objectClass: top
nsds5replicahost: f-ldap04.sandbox.in.pan-net.eu
nsds5replicaport: 389
nsds5ReplicaBindDN: cn=test,cn=config
nsds5replicabindmethod: SIMPLE
nsds5replicaroot: dc=example,dc=com
description: test
nsds5replicaupdateschedule: 0001-2359 0123456
nsds5replicatedattributelist: (objectclass=*) $ EXCLUDE authorityRevocationList
nsds5replicacredentials: test
nsds5BeginReplicaRefresh: start
nsds5ReplicaEnabled: True

Please note nsds5ReplicaEnabled contains invalid value True instead of on

Actual results

[root@f-ldap03 fedora]# ldapadd -D cn=root -wadmin -h localhost -f crash.ldif
adding new entry "cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config"
ldap_result: Can't contact LDAP server (-1)

(actually, the daemon crashed)

● dirsrv@f-ldap03.service - 389 Directory Server f-ldap03.
   Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2017-09-13 12:04:56 CEST; 5min ago
  Process: 23738 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-f-ldap03 -i /var/run/dirsrv/slapd-f-ldap03.pid (code=killed, signal=SEGV)
  Process: 23733 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl /etc/dirsrv/slapd-f-ldap03/dse.ldif (code=exited, status=0/SUCCESS)
 Main PID: 23738 (code=killed, signal=SEGV)
   Status: "slapd started: Ready to process requests"

Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.862396510 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot entry cache (2 total): 65536k
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.885522798 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot dn cache (2 total): 65536k
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.913912748 +0200] - NOTICE - ldbm_back_start - total cache size: 289193410 B;
Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.939287864 +0200] - NOTICE - dblayer_start - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:52.530646439 +0200] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu systemd[1]: Started 389 Directory Server f-ldap03..
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:56.107134621 +0200] - ERR - NSMMReplicationPlugin - agmt_new_from_entry - Warning invalid value for nsds5ReplicaEnabled (True), value must be "on" or "off".  Ignoring this repl agreement.
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Main process exited, code=killed, status=11/SEGV
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Unit entered failed state.
Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Failed with result 'signal'.

On Debian, the log is even more funny:

Sep  1 16:13:39 ldap01 ns-slapd[12032]: *** Error in `/usr/sbin/ns-slapd': malloc(): memory corruption (fast): 0x00007f473c020e3f ***
Sep  1 16:13:39 ldap01 ns-slapd[12032]: ======= Backtrace: =========
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f47714c108b]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x85008)[0x7f47714cd008]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f47714ce984]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__strdup+0x1a)[0x7f47714d579a]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_ch_strdup+0x13)[0x7f4772ccca33]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_sdn_set_dn_byval+0x2d)[0x7f4772cd5add]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/sbin/ns-slapd(+0x1a2ae)[0x5633607092ae]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/libnspr4.so(+0x27ed9)[0x7f4771c6fed9]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da)[0x7f47718166da]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f)[0x7f4771550d7f]
Sep  1 16:13:39 ldap01 ns-slapd[12032]: ======= Memory map: ========

Expected results

Not crashing and creating the agreement


I can provide Ansible playbooks for replicating the problem if necessary.

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

6 years ago

The crashing stack

Thread 44 "ns-slapd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb659024700 (LWP 22781)]
0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588
588     if (sch->pending_event) {
(gdb) p *sch
Cannot access memory at address 0x0
(gdb) where
#0  0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588
#1  0x00007fb68af10b7f in schedule_destroy (s=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:134
#2  0x00007fb68aee00d6 in agmt_delete (rap=0x7fb659023168) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:634
#3  0x00007fb68aedfddb in agmt_new_from_entry (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:537
#4  0x00007fb68aee63b4 in add_new_agreement (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:136
#5  0x00007fb68aee673c in agmtlist_add_callback (pb=0x223ff60, e=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "", arg=0x0)
    at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:232
#6  0x00007fb697a4f4e8 in dse_call_callback (pdse=0x1e86ff0, pb=0x223ff60, operation=16, flags=1, entryBefore=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "") at ../389-ds-base/ldap/servers/slapd/dse.c:2523
#7  0x00007fb697a4e7f1 in dse_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/dse.c:2220
#8  0x00007fb697a3140f in op_shared_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:671
#9  0x00007fb697a303fd in do_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:228
#10 0x0000000000416a9c in connection_dispatch_operation (conn=0x47c9d70, op=0x21e0000, pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/connection.c:609
#11 0x0000000000418a3a in connection_threadmain () at ../389-ds-base/ldap/servers/slapd/connection.c:1761
#12 0x00007fb6957c470c in _pt_root () at /lib64/libnspr4.so
#13 0x00007fb69516173a in start_thread () at /lib64/libpthread.so.0
#14 0x00007fb694c52e7f in clone () at /lib64/libc.so.6

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: None)

6 years ago

The fix looks good.
schedule_set is called with ra->schedule. If it happens that ra->schedule can be NULL we may also want to test its value in schedule_set.

The fix looks good.
schedule_set is called with ra->schedule. If it happens that ra->schedule can be NULL we may also want to test its value in schedule_set.

I don't think it can be NULL when we call schedule_set(), but I put the check in anyway. Revised patch:

0001-Ticket-49380-Crash-when-adding-invalid-replication-a.patch

Thanks Mark, the patch looks good to me. ACK

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

fe1cfca..610db47 master -> master

5a2b673..3535afb 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1491706
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)

6 years ago

LGTM,
though, could you please add a proper docstring?

Like this:

"""Test checks that an invalid agreement is properly rejected
and does not crash the server

:id: 6c3b2a7e-edcd-4327-a003-6bd878ff722b
:setup: MMR with four masters
:steps:
    1. Add invalid agreement (nsds5ReplicaEnabled set to invalid value)
    2. Verify the server is still running
:expectedresults:
    1. Invalid repl agreement should be rejected
    2. Server should be still running
"""

Thanks! You have my ack.

One small issue though, the commit message body is a bit too long - 93 chars.

Metadata Update from @spichugi:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

CI Test

3919217..02d76b6 master -> master

3535afb..84109ee 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.6.0

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2439

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata