If a connection needs to be closed, we schedule ns_handle_closure. The problem is if the connection is still on the active list (refcnt!=0), event thread yield.
I guess it was done like this as in such case we schedule again ns_handle_closure. So event thread will be called immediately. I guess it was done to prevent event thread to loop and consum CPU.
The side effect, is that event thread can not process immediately an other event on an other connection.
I think we need to reevaluate the need of the yield in the event thread.
PS: I think that if we fail to close a connection (because busy) we can also schedule it with some kind of delay
Observed in1.3.7.5-24. I do not know if it could be a consequence of Bug 1597530 - Async operations can hang when the server is running nunc-stans
I just notice this kind of pstack on many customer cases
event thread may yield
Event thread should not yield and process event immediately
#0 0x00007f54e55cf9d7 in sched_yield () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f54e659e50d in PR_Sleep (ticks=0) at ../../../nspr/pr/src/pthreads/ptthread.c:787 #2 0x00007f54e8a7cc89 in work_job_execute (job=0x55682d10ed20) at src/nunc-stans/ns/ns_thrpool.c:291 #3 0x00007f54e8a7dbe3 in event_cb (fd=<optimized out>, event=<optimized out>, arg=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:118 #4 0x00007f54e5acda14 in event_process_active_single_queue (activeq=0x556816474ff0, base=0x5568162c6c80) at event.c:1350 #5 0x00007f54e5acda14 in event_process_active (base=<optimized out>) at event.c:1420 #6 0x00007f54e5acda14 in event_base_loop (base=0x5568162c6c80, flags=flags@entry=1) at event.c:1621 #7 0x00007f54e8a7deae in ns_event_fw_loop (ns_event_fw_ctx=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:308 #8 0x00007f54e8a7cac9 in event_loop_thread_func (arg=0x55681628ba40) at src/nunc-stans/ns/ns_thrpool.c:581 #9 0x00007f54e5f3ddd5 in start_thread (arg=0x7f54cb094700) at pthread_create.c:308 #10 0x00007f54e55eab3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) frame 2 (gdb) print *job func = 0x556814ee4190 <ns_handle_closure>, data = 0x556817ddfd00, job_type = 16, fd = 0x0, tv = {tv_sec = 0, tv_usec = 0}, signal = 0, ns_event_fw_fd = 0x0, ns_event_fw_time = 0x55682016fe60, ns_event_fw_sig = 0x0, output_job_type = 16, state = NS_JOB_NEEDS_DELETE, ns_event_fw_ctx = 0x5568162c6c80, alloc_event_context = 0x7f54e8a7c600 <alloc_event_context>, static void ns_handle_closure(struct ns_job_t *job) { ... do_yield = ns_handle_closure_nomutex(c); ... if (do_yield) { /* closure not done - another reference still outstanding */ /* yield thread after unlocking conn mutex */ PR_Sleep(PR_INTERVAL_NO_WAIT); /* yield to allow other thread to release conn */ } return; }
Metadata Update from @tbordaz: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field reviewstatus adjusted to None - Custom field type adjusted to None - Custom field version adjusted to None
The event thread can be looping and consuming 100% CPU. It consums CPU although it always appear in sleep, because each time it is processing a different job. Likely the job failing to remove the connection from the active list, dispatch a new ns_handle_closure job, that will be dispatched immediately
The proof is that doing several pstacks shows a different job.
Metadata Update from @mreynolds: - Issue set to the milestone: 1.4.0
Metadata Update from @tbordaz: - Issue assigned to tbordaz
Metadata Update from @tbordaz: - Assignee reset - Issue close_status updated to: duplicate - Issue status updated to: Closed (was: Open)
Duplicate of https://pagure.io/389-ds-base/issue/49815
Metadata Update from @tbordaz: - Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1605554
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2930
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: duplicate)
Log in to comment on this ticket.