#49921 nsslapd-enable-nunc-stans should only enable new connection handling
Closed: wontfix 6 months ago by vashirov. Opened 2 years ago by tbordaz.

Issue Description

In 1.3.6 nunc stans code was responsible of accepting new connections.

In 1.3.7 nunc stans (libevent) was responsible to manage established connections (read and closure events).

The two parts (new/established connections) were enabled with the same configuration attribute: nsslapd-enable-nunc-stans.

It should be possible to enable the first part (new connection) separately from the second part (established connection). Indeed the first part addresses C10K feature and the test can be more limited. The second part is to benefit from libevent with a high number of established connections (instead of a large polling array). The test scenario for established connection is more complex (long duration, hanging client, network issue..)

Package Version and Platform

Since 1.3.7

Steps to reproduce

1.nsslapd-enable-nunc-stans: on make use of nunc-stans for both new/established connection

Actual results

nsslapd-enable-nunc-stans control the use of nunc stans for both new and established connections

Expected results

We should have a way to enable nunc stans for new connection only and new+establish connection

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None
- Issue set to the milestone:

2 years ago

That's not how NS is integrated today. NS today is used for detection of IO events, but the slapd thread mechansim is used for actual operation management. And this alone is the source of all our issues. The integration is taking two thread models, and NS is mis-using it's own event model.

The correct options are either don't use NS at all, or us NS as intended. There is a patch for this in the ns-conn-deadlock ticket which I am rebasing and updating, but I have hit a threadlocal storage issues. So this will be fixed once I'm done that.

Metadata Update from @firstyear:
- Issue assigned to firstyear

2 years ago

@firstyear thanks for your precious feedback. I am not sure I fully understand your point.
First I would like to investigate the possibility to keep using NS. Metrics shows it improves new connection throughput and I think it is a good idea to use it as well for established connection.

My understanding is that we have two mechanisms using NS+libevent:
- persistent ns_handle_new_connection handler on enable port
- ns_thrpool_wait handlers (ns_handle_pr_read_ready, ns_handle_closure) on the event queue.

Both mechanisms are enabled with 'enable_nunc_stans' flag and I was wonder if it is possible for each mechanism to use its own flag. So that we can select only ns_handle_new_connection or both.

@tbordaz Hey there! Actually, our gain in connectino speed is because of ns_handle_pr_read_ready, not the new_connection. In the current (legacy) design there is a connection handling loop already, that just accepts, so this is the same as new_connection.

The benefit is in parallelisation of the per-connection IO event's being raised, and then pushed into the slapi workers.

So there isn't much point flagging to split them, because we need the second part - and that's the part that's not integrated properly and causing the issues.

I have rebased and updated my patches for this, I'll put in a PR soon, but there is a thread local storage issue I need to resolve with Mark first. That will fix up the integration. I have some free time next week so I'll do it early then :)

@firstyear, I think we agree there are two mechanisms: 1 - new connections and 2- established connections. The fact that the major gain in term of performance was in established connections (ns_handle_pr_read_ready) means that it should be our ultimate goal.

Legacy code use a single thread (listener) to poll opened ports and established connections. So with an increasing size of active connection array, polling of opened ports is impacted and reduce the ability of the server to accept new connections.

The first mechanism (1.3.6) used event+NS to dedicate an handler per opened ports. At that time established connections were still polled with connection array. So even if # active connection increased, it did not reduce the ability of the server to accept new connections.

The second mechanism (1.3.7) used event+NS to dedicate a handler per active connection. This mechanism bringing a big gain for connection speed it worth reenabling it ASAP.

The first mechanism was well integrated (no bug/case) in 1.3.6 and IMHO is much simpler to test. This second mechanism needs more integration work and as been proven as more complex to test.

This is the reason of that ticket as I think the first mechanism is a low hanging fruit: it works and "easy" to enable again.

I think it's the other way around. Accepting connections benefits little from NS, but existing ones benefit greatly in ns_handle_pr_read_ready. However the ns-handle_pr_read_ready is the problem as it doesn't match the way NS should work. The issue is that I fixed a huge number of thread saftey issues in NS, and that exposed a stack of assumptions and poor integration choices in the DS side.

As a result, I think that case 2, established connections, need more work to fix - and I am working on it, I need to fix a thread local storage issue I have encountered. I don't see much point to seperate them back out, because I think that just creates more "subtle" issues overtime, rather than getting it right.

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.2 (was:

a year ago

Metadata Update from @vashirov:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

6 months ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2980

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.