#49871 Nunc-stans: event thread yield when closing a connection that is busy
Closed: wontfix 3 years ago Opened 3 years ago by tbordaz.

Issue Description

If a connection needs to be closed, we schedule ns_handle_closure. The problem is if the connection is still on the active list (refcnt!=0), event thread yield.

I guess it was done like this as in such case we schedule again ns_handle_closure. So event thread will be called immediately. I guess it was done to prevent event thread to loop and consum CPU.

The side effect, is that event thread can not process immediately an other event on an other connection.

I think we need to reevaluate the need of the yield in the event thread.

PS:
I think that if we fail to close a connection (because busy) we can also schedule it with some kind of delay

Package Version and Platform

Observed in1.3.7.5-24. I do not know if it could be a consequence of Bug 1597530 - Async operations can hang when the server is running nunc-stans

Steps to reproduce

I just notice this kind of pstack on many customer cases

Actual results

event thread may yield

Expected results

Event thread should not yield and process event immediately


#0  0x00007f54e55cf9d7 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f54e659e50d in PR_Sleep (ticks=0) at ../../../nspr/pr/src/pthreads/ptthread.c:787
#2  0x00007f54e8a7cc89 in work_job_execute (job=0x55682d10ed20) at src/nunc-stans/ns/ns_thrpool.c:291
#3  0x00007f54e8a7dbe3 in event_cb (fd=<optimized out>, event=<optimized out>, arg=<optimized out>)
    at src/nunc-stans/ns/ns_event_fw_event.c:118
#4  0x00007f54e5acda14 in event_process_active_single_queue (activeq=0x556816474ff0, base=0x5568162c6c80) at event.c:1350
#5  0x00007f54e5acda14 in event_process_active (base=<optimized out>) at event.c:1420
#6  0x00007f54e5acda14 in event_base_loop (base=0x5568162c6c80, flags=flags@entry=1) at event.c:1621
#7  0x00007f54e8a7deae in ns_event_fw_loop (ns_event_fw_ctx=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:308
#8  0x00007f54e8a7cac9 in event_loop_thread_func (arg=0x55681628ba40) at src/nunc-stans/ns/ns_thrpool.c:581
#9  0x00007f54e5f3ddd5 in start_thread (arg=0x7f54cb094700) at pthread_create.c:308
#10 0x00007f54e55eab3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) frame 2
(gdb) print *job
  func = 0x556814ee4190 <ns_handle_closure>, data = 0x556817ddfd00, job_type = 16, fd = 0x0, tv = {tv_sec = 0, tv_usec = 0}, 
  signal = 0, ns_event_fw_fd = 0x0, ns_event_fw_time = 0x55682016fe60, ns_event_fw_sig = 0x0, output_job_type = 16, 
  state = NS_JOB_NEEDS_DELETE, ns_event_fw_ctx = 0x5568162c6c80, alloc_event_context = 0x7f54e8a7c600 <alloc_event_context>,

static void
ns_handle_closure(struct ns_job_t *job)
{
...
    do_yield = ns_handle_closure_nomutex(c);
...
    if (do_yield) {
        /* closure not done - another reference still outstanding */
        /* yield thread after unlocking conn mutex */
        PR_Sleep(PR_INTERVAL_NO_WAIT); /* yield to allow other thread to release conn */
    }
    return;
}

Metadata Update from @tbordaz:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

3 years ago

The event thread can be looping and consuming 100% CPU. It consums CPU although it always appear in sleep, because each time it is processing a different job.
Likely the job failing to remove the connection from the active list, dispatch a new ns_handle_closure job, that will be dispatched immediately

The proof is that doing several pstacks shows a different job.

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.0

3 years ago

Metadata Update from @tbordaz:
- Issue assigned to tbordaz

3 years ago

Metadata Update from @tbordaz:
- Assignee reset
- Issue close_status updated to: duplicate
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata Update from @tbordaz:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1605554

3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2930

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: duplicate)

2 years ago

Login to comment on this ticket.

Metadata