When creating new replication agreement, with nsds5ReplicaEnabled: True, the server crashes.
nsds5ReplicaEnabled: True
Fedora 27
389-ds-console-1.2.16-3.fc27.noarch 389-ds-1.2.2-10.fc27.noarch 389-adminutil-1.1.23-4.fc27.x86_64 389-console-1.1.18-3.fc27.noarch 389-ds-base-1.3.7.4-1.fc28.x86_64 389-admin-console-1.1.12-3.fc27.noarch 389-ds-console-doc-1.2.16-3.fc27.noarch 389-dsgw-1.1.11-13.fc27.x86_64 389-ds-base-libs-1.3.7.4-1.fc28.x86_64 389-admin-1.1.46-1.fc27.3.x86_64 389-admin-console-doc-1.1.12-3.fc27.noarch
f-ldap03.sandbox.in.pan-net.eu
f-ldap04.sandbox.in.pan-net.eu
cn=test,cn=config
test
cn=replica
dn: cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config objectClass: nsds5replicationagreement objectClass: top nsds5replicahost: f-ldap04.sandbox.in.pan-net.eu nsds5replicaport: 389 nsds5ReplicaBindDN: cn=test,cn=config nsds5replicabindmethod: SIMPLE nsds5replicaroot: dc=example,dc=com description: test nsds5replicaupdateschedule: 0001-2359 0123456 nsds5replicatedattributelist: (objectclass=*) $ EXCLUDE authorityRevocationList nsds5replicacredentials: test nsds5BeginReplicaRefresh: start nsds5ReplicaEnabled: True
Please note nsds5ReplicaEnabled contains invalid value True instead of on
nsds5ReplicaEnabled
True
on
[root@f-ldap03 fedora]# ldapadd -D cn=root -wadmin -h localhost -f crash.ldif adding new entry "cn=whatever,cn=replica,cn="dc=example,dc=com",cn=mapping tree,cn=config" ldap_result: Can't contact LDAP server (-1)
(actually, the daemon crashed)
● dirsrv@f-ldap03.service - 389 Directory Server f-ldap03. Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Wed 2017-09-13 12:04:56 CEST; 5min ago Process: 23738 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-f-ldap03 -i /var/run/dirsrv/slapd-f-ldap03.pid (code=killed, signal=SEGV) Process: 23733 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl /etc/dirsrv/slapd-f-ldap03/dse.ldif (code=exited, status=0/SUCCESS) Main PID: 23738 (code=killed, signal=SEGV) Status: "slapd started: Ready to process requests" Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.862396510 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot entry cache (2 total): 65536k Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.885522798 +0200] - NOTICE - ldbm_back_start - cache autosizing: NetscapeRoot dn cache (2 total): 65536k Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.913912748 +0200] - NOTICE - ldbm_back_start - total cache size: 289193410 B; Sep 13 12:04:51 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:51.939287864 +0200] - NOTICE - dblayer_start - Detected Disorderly Shutdown last time Directory Server was running, recovering database. Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:52.530646439 +0200] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests Sep 13 12:04:52 f-ldap03.sandbox.in.pan-net.eu systemd[1]: Started 389 Directory Server f-ldap03.. Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu ns-slapd[23738]: [13/Sep/2017:12:04:56.107134621 +0200] - ERR - NSMMReplicationPlugin - agmt_new_from_entry - Warning invalid value for nsds5ReplicaEnabled (True), value must be "on" or "off". Ignoring this repl agreement. Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Main process exited, code=killed, status=11/SEGV Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Unit entered failed state. Sep 13 12:04:56 f-ldap03.sandbox.in.pan-net.eu systemd[1]: dirsrv@f-ldap03.service: Failed with result 'signal'.
On Debian, the log is even more funny:
Sep 1 16:13:39 ldap01 ns-slapd[12032]: *** Error in `/usr/sbin/ns-slapd': malloc(): memory corruption (fast): 0x00007f473c020e3f *** Sep 1 16:13:39 ldap01 ns-slapd[12032]: ======= Backtrace: ========= Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f47714c108b] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(+0x85008)[0x7f47714cd008] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f47714ce984] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(__strdup+0x1a)[0x7f47714d579a] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_ch_strdup+0x13)[0x7f4772ccca33] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/dirsrv/libslapd.so.0(slapi_sdn_set_dn_byval+0x2d)[0x7f4772cd5add] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /usr/sbin/ns-slapd(+0x1a2ae)[0x5633607092ae] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /usr/lib/x86_64-linux-gnu/libnspr4.so(+0x27ed9)[0x7f4771c6fed9] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da)[0x7f47718166da] Sep 1 16:13:39 ldap01 ns-slapd[12032]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x5f)[0x7f4771550d7f] Sep 1 16:13:39 ldap01 ns-slapd[12032]: ======= Memory map: ========
Not crashing and creating the agreement
I can provide Ansible playbooks for replicating the problem if necessary.
Metadata Update from @mreynolds: - Issue assigned to mreynolds
The crashing stack
Thread 44 "ns-slapd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fb659024700 (LWP 22781)] 0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588 588 if (sch->pending_event) { (gdb) p *sch Cannot access memory at address 0x0 (gdb) where #0 0x00007fb68af115b7 in unschedule_window_state_change_event (sch=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:588 #1 0x00007fb68af10b7f in schedule_destroy (s=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_schedule.c:134 #2 0x00007fb68aee00d6 in agmt_delete (rap=0x7fb659023168) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:634 #3 0x00007fb68aedfddb in agmt_new_from_entry (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmt.c:537 #4 0x00007fb68aee63b4 in add_new_agreement (e=0x4b4cd00) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:136 #5 0x00007fb68aee673c in agmtlist_add_callback (pb=0x223ff60, e=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "", arg=0x0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_agmtlist.c:232 #6 0x00007fb697a4f4e8 in dse_call_callback (pdse=0x1e86ff0, pb=0x223ff60, operation=16, flags=1, entryBefore=0x4b4cd00, entryAfter=0x0, returncode=0x7fb6590233d4, returntext=0x7fb659023450 "") at ../389-ds-base/ldap/servers/slapd/dse.c:2523 #7 0x00007fb697a4e7f1 in dse_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/dse.c:2220 #8 0x00007fb697a3140f in op_shared_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:671 #9 0x00007fb697a303fd in do_add (pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/add.c:228 #10 0x0000000000416a9c in connection_dispatch_operation (conn=0x47c9d70, op=0x21e0000, pb=0x223ff60) at ../389-ds-base/ldap/servers/slapd/connection.c:609 #11 0x0000000000418a3a in connection_threadmain () at ../389-ds-base/ldap/servers/slapd/connection.c:1761 #12 0x00007fb6957c470c in _pt_root () at /lib64/libnspr4.so #13 0x00007fb69516173a in start_thread () at /lib64/libpthread.so.0 #14 0x00007fb694c52e7f in clone () at /lib64/libc.so.6
Metadata Update from @mreynolds: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field reviewstatus adjusted to None - Custom field type adjusted to None - Custom field version adjusted to None
<img alt="0001-Ticket-49380-Crash-when-adding-invalid-replication-a.patch" src="/389-ds-base/issue/raw/files/8f8a7748efaebbe939fa0d9387522587888bb2f76c1308259e75c3161777927e-0001-Ticket-49380-Crash-when-adding-invalid-replication-a.patch" />
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: None)
The fix looks good. schedule_set is called with ra->schedule. If it happens that ra->schedule can be NULL we may also want to test its value in schedule_set.
I don't think it can be NULL when we call schedule_set(), but I put the check in anyway. Revised patch:
<img alt="0001-Ticket-49380-Crash-when-adding-invalid-replication-a.patch" src="/389-ds-base/issue/raw/files/f393d66e802582cd22e15376ae34a73489cebd8d2c5ec5807112d5ad095b76d0-0001-Ticket-49380-Crash-when-adding-invalid-replication-a.patch" />
Thanks Mark, the patch looks good to me. ACK
Metadata Update from @tbordaz: - Custom field reviewstatus adjusted to ack (was: review)
fe1cfca..610db47 master -> master
5a2b673..3535afb 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
Metadata Update from @mreynolds: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1491706 - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
Add CI test
<img alt="0001-Ticket-49380-Add-CI-test.patch" src="/389-ds-base/issue/raw/files/deb567b0489df055329680573d24f60b0593e2c29506d7fbfe113d0821bbb806-0001-Ticket-49380-Add-CI-test.patch" />
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to review (was: ack)
LGTM, though, could you please add a proper docstring?
Like this:
"""Test checks that an invalid agreement is properly rejected and does not crash the server :id: 6c3b2a7e-edcd-4327-a003-6bd878ff722b :setup: MMR with four masters :steps: 1. Add invalid agreement (nsds5ReplicaEnabled set to invalid value) 2. Verify the server is still running :expectedresults: 1. Invalid repl agreement should be rejected 2. Server should be still running """
Newly revised patch:
<img alt="0001-Ticket-49380-Add-CI-test.patch" src="/389-ds-base/issue/raw/files/aaefd4d6192d641aeb9e11518af1f3cd0f5f17de4f85c76738ed8c820f9def7b-0001-Ticket-49380-Add-CI-test.patch" />
Thanks! You have my ack.
One small issue though, the commit message body is a bit too long - 93 chars.
Metadata Update from @spichugi: - Custom field reviewstatus adjusted to ack (was: review)
CI Test
3919217..02d76b6 master -> master
3535afb..84109ee 389-ds-base-1.3.6 -> 389-ds-base-1.3.6
Metadata Update from @mreynolds: - Issue set to the milestone: 1.3.6.0
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2439
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.