#49848 Nunc-stans event thread can be hanging shortly when a job is (re)armed
Closed: wontfix 3 years ago Opened 3 years ago by tbordaz.

Issue Description

When a nunc-stans job is rearmed, the job is enqueued (event_q) and event thread is notified.

The event thread will dequeue the job, lock it, launch its callback and unlock the job.
The problem is that during rearm, the job lock is released after the notification. So if the event thread is scheduled immediately at notification and if the armed job is the one dequeued, the event thread will "hang" until the thread running the "rearm" will unlock the job

Signature of the bug is

Thread 54 (Thread 0x7feb416ee700 (LWP 1305)):
#0  0x00007feb7be2f42d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007feb7be2ade6 in _L_lock_870 () from /lib64/libpthread.so.0
#2  0x00007feb7be2acdf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007feb7e5c96cc in update_event () from /usr/lib64/dirsrv/libnunc-stans.so.0
#4  0x00007feb7e5c99ed in get_new_event_requests.isra.3 () from /usr/lib64/dirsrv/libnunc-stans.so.0
#5  0x00007feb7e5c9ad1 in wakeup_cb () from /usr/lib64/dirsrv/libnunc-stans.so.0
#6  0x00007feb7e5c9bf9 in work_job_execute () from /usr/lib64/dirsrv/libnunc-stans.so.0
#7  0x00007feb7e5ca9cb in event_cb () from /usr/lib64/dirsrv/libnunc-stans.so.0
#8  0x00007feb7bbe9a14 in event_base_loop () from /lib64/libevent-2.0.so.5
#9  0x00007feb7e5cac4e in ns_event_fw_loop () from /usr/lib64/dirsrv/libnunc-stans.so.0
#10 0x00007feb7e5c9a39 in event_loop_thread_func () from /usr/lib64/dirsrv/libnunc-stans.so.0
#11 0x00007feb7be28e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007feb7b70a34d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7feb2eec9700 (LWP 1342)):
#0  0x00007feb7b6eee47 in sched_yield () from /lib64/libc.so.6
#1  0x00007feb7c48931d in PR_Sleep () from /lib64/libnspr4.so
#2  0x00007feb7e5c9905 in internal_ns_job_rearm () from /usr/lib64/dirsrv/libnunc-stans.so.0
#3  0x00007feb7e5ca0c2 in ns_add_io_timeout_job () from /usr/lib64/dirsrv/libnunc-stans.so.0
#4  0x000056012824254b in ns_connection_post_io_or_closing ()
#5  0x000056012823f67a in connection_threadmain ()
#6  0x00007feb7c4889bb in _pt_root () from /lib64/libnspr4.so
#7  0x00007feb7be28e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00007feb7b70a34d in clone () from /lib64/libc.so.6

Consequences: The problem is transient as the job will be released and event thread can continue. However while it is waiting for the lock, the server is likely to not processed received events that will be processed with a delay.
Some cases were reported that DS may be transiently missing new connection (accept). It is a possibility but I am not sure of that

Package Version and Platform

Since 7.4 (1.3.6)

Steps to reproduce

No identified reproducer.. so far

Actual results

DS may appear like hanging for short period (not processing: new connection, new request, signals, timers..)

This period should be very short so it is not clear if it can have a significant impact

Expected results

event thread should never wait for a lock

Metadata Update from @tbordaz:
- Issue assigned to tbordaz

3 years ago

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None
- Issue set to the milestone: 1.4.0

3 years ago

Metadata Update from @tbordaz:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2907

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.