Replace the slapi direct nspr thread control and event system with nuncstans. This includes timer tasks, async events, and worker threads themselves.
http://www.port389.org/docs/389ds/design/nunc-stans-workers.html
This can be targeted for 1.3.7 or later.
Metadata Update from @firstyear: - Issue assigned to firstyear - Issue set to the milestone: 1.3.7 backlog
Metadata Update from @firstyear: - Issue close_status updated to: None - Issue tagged with: Complex, Performance
<img alt="0001-Ticket-49099-ns-workers-prep.patch" src="/389-ds-base/issue/raw/files/ff3bb73c36974a71bc277994a4d779354f5099e2b92b9f85aaa44bb10d300a7f-0001-Ticket-49099-ns-workers-prep.patch" />
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to review
<img alt="0001-Ticket-49099-ns-workers-prep.patch" src="/389-ds-base/issue/raw/files/69bc1f2f7ecc97a2cc2489cc64b301f9c4765f9d98af3a357c3108193c5ea93f-0001-Ticket-49099-ns-workers-prep.patch" />
This patch serves a number of purposes:
An important aspect of this patch is that it was tested with and without config.enable_nunc_stans on/off. This means if we have issues, we can still turn nunc-stans off and back out of the change.
Looks good! Nice and clean, no indentation/spacing issues :) Thanks!
Thanks!
It looks like a combination of things has a problem buildng though. :(
commit 19f676a To ssh://git @pagure.io/389-ds-base.git 54e4fca. 19f676a master -> master
<img alt="0001-Ticket-49099-fix-configure.ac-due-to-NS-change.patch" src="/389-ds-base/issue/raw/files/e7d9d400a6faf341efaacc5406ed9d96da3541f89bdc6ea5dd5ff992aca35f86-0001-Ticket-49099-fix-configure.ac-due-to-NS-change.patch" />
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to review (was: ack)
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to ack (was: review)
commit a05cf36 To ssh://git@pagure.io/389-ds-base.git 7b3e401..d7a4910 master -> master
This "fix" broke the server. The server fails to install or start or stop, etc.
attaching gdb and running the startup shows
... ... [New Thread 0x7fff257fa700 (LWP 15335)] new_ns_job acdda0 initial NS_JOB_WAITING ns_add_io_job state 7 moving to NS_JOB_ARMED internal_ns_rearm_job acdda0 state 4 moving to NS_JOB_ARMED event_q_notify enqueuing acdda0 with state 5 sds_queue_dequeue: Queue 0x8e1ac0 - <== enqueuing sds_queue_enqueue: Queue 0x8e1ac0 - Queueing ptr 0xacdda0 to 0xaaeab0 sds_queue_enqueue: Queue 0x8e1ac0 - empty, adding 0xaaeab0 to head and tail sds_queue_enqueue: Queue 0x8e1ac0 - complete head: 0xaaeab0 tail: 0xaaeab0 event_q_wake attempting to wake event queue. event_q_wake result. 0 event_cb 8e2460 state 5 non-threaded, execute right meow work_job_execute 8e2460 state 5 moving to NS_JOB_RUNNING wakeup_cb 8e2460 state 6 wakeup_cb sds_queue_dequeue: Queue 0x8e1ac0 - ==> dequeuing sds_queue_dequeue: Queue 0x8e1ac0 - complete head: (nil) tail: (nil) get_new_event_requests Dequeuing acdda0 with state 5 update_event acdda0 state 5 sds_queue_dequeue: Queue 0x8e1ac0 - ==> dequeuing sds_queue_dequeue: Queue 0x8e1ac0 - queue exhausted. work_job_execute PERSIST and RUNNING, remarking 8e2460 as NS_JOB_NEEDS_ARM work_job_execute 8e2460 state 4 job func complete, sending to rearm... internal_ns_rearm_job 8e2460 state 4 moving to NS_JOB_ARMED update_event 8e2460 state 5 event_loop_thread_func woke event queue. rc=1 sds_queue_dequeue: Queue 0x8e1ac0 - ==> dequeuing sds_queue_dequeue: Queue 0x8e1ac0 - queue exhausted. ns_thrpool_wait has begun sds_queue_dequeue: Queue 0x7e9460 - ==> dequeuing sds_queue_dequeue: Queue 0x7e9460 - complete head: 0x8e2a30 tail: 0x8e5cd0 ^C Thread 1 "ns-slapd" received signal SIGINT, Interrupt. 0x00007ffff524f96d in pthread_join () from /lib64/libpthread.so.0 (gdb) where #0 0x00007ffff524f96d in pthread_join () from /lib64/libpthread.so.0 #1 0x00007ffff7bd2dba in ns_thrpool_wait (tp=0x8e19a0) at ../389-ds-base/src/nunc-stans/ns/ns_thrpool.c:1564 #2 0x000000000041cb79 in slapd_daemon (ports=0x7fffffffdc10, tp=0x8e19a0) at ../389-ds-base/ldap/servers/slapd/daemon.c:1107 #3 0x0000000000425f0b in main (argc=9, argv=0x7fffffffdd48) at ../389-ds-base/ldap/servers/slapd/main.c:1180
But the process never fully starts. Running start-dirsrv just hangs.
What are you doing to produce this error? I can't reproduce, and I can the full ticket test suite with the patch to make sure it worked. I'm confused to what's going on here....
If this is blocking you, revert it, and we can re-add later. Alternately, set nunc-stans OFF in libglobs.
1651 init_enable_nunc_stans = cfg->enable_nunc_stans = LDAP_ON;
^ That line in libglobs, set to LDAP_OFF.
I do a "make install" as I always do. But now I can not install servers, I can not stop or start them - they all hang here:
#0 0x00007ffff524f96d in pthread_join () from /lib64/libpthread.so.0 #1 0x00007ffff7bd2dba in ns_thrpool_wait (tp=0x8e19a0) at ../389-ds-base/src/nunc-stans/ns/ns_thrpool.c:1564 #2 0x000000000041cb79 in slapd_daemon (ports=0x7fffffffdc10, tp=0x8e19a0) at ../389-ds-base/ldap/servers/slapd/daemon.c:1107
And all the CI tests fail on the Jenkins server. If I go back to before this change everything works fine - which is what I did to workaround this problem, but this needs to get fixed asap.
Mate, no matter what I do, I can not reproduce this. Can you shoot me an email with thread apply all bt from the "broken" server? As well, can you give me the console output / error log?
I use this configure command (on F25):
CFLAGS='-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' CXXFLAGS='-g -pipe -Wall -O0 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' ../389-ds-base/configure --enable-autobind --with-selinux --with-openldap --with-tmpfiles-d=/etc/tmpfiles.d --with-systemdsystemunitdir=/usr/lib/systemd/system --with-systemdsystemconfdir=/etc/systemd/system --enable-debug --with-systemdgroupname=dirsrv.target --with-fhs --libdir=/usr/lib64 --with-systemd
Then "make install"
Thread apply all shows all the worker threads, but main() is stuck trying to join a thread as shown above. It never detaches the process - you can only "attach" gdb if you use gdb to start the server: "gdb /usr/sbin/ns-slapd" --> set args --> run
There is also nothing in the logs. I even completely wiped my system of all things dirsrv, and started from scratch, but no luck. So for now I can only develop on the 1.3.6 branch.
I just want to chime in and confirm that server doesn't start fully after 19f676a. I'm using make -f rpm.mk srpms and build using mock.
make -f rpm.mk srpms
I've managed to reproduce this. It looks like an issue with systemd + this patch. I'm working on it now.
<img alt="0001-Ticket-49099-resolve-systemd-startup-interaction-wit.patch" src="/389-ds-base/issue/raw/files/2b1d52cde0e14c593cc035ac2eefdbab53e5ecf2412847ee8ef4f651c21c4dc2-0001-Ticket-49099-resolve-systemd-startup-interaction-wit.patch" />
This patch resolves the issue :) Very sorry about this issue. My development environment does not use systemd :(
Quick test shows it passes all the basic tests on a systemd enabled system.
Works for me. But I noticed that instance creation now takes 7 seconds instead of 2. I guess this is a side effect of e086b83?
Works for me, everything looks good, ack
commit d3aa098 To ssh://git@pagure.io/389-ds-base.git 15f5f6a..d3aa098 master -> master
Metadata Update from @firstyear: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2158
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Login to comment on this ticket.