#48977 Nunc-stans for thread workers and connection tree
Closed: wontfix 4 months ago by mreynolds. Opened 3 years ago by firstyear.

The connection table and locking over it has been a long source of issues and bottlenecks in our code. As a result, it's become a hugely complex part of the application, which makes it even harder to continue to work upon and improve.

We should replace the current thread worker and connection table design with nunc-stans and a connection tree.

Key to this, is that with nunc-stans we no longer require iterating and polling over the connection table to determine work to be completed. This means we remove a lock (serialisation point) and potential source of issue.

We should start a set of nunc-stans worker threads, rather than start_thread()/connection_threadmain().

When a socket is accepted, it's connection is inserted into a tree ( so we maintain a list of current connections), and the fd is put into nunc-stans as an io read job. When nunc-stans detects IO, it would call some new work function followed by connection_read_operation(). From there the work proceeds as normal.

At this point, we have already eliminated a large burden on the server, as we are not iterating over the connection table for new work.

Additionally, because we choose to use a tree, not a table, adding new connections is fast. When we close a connection, we can quickly remove it from the tree too. Iterating over the tree content is a trivial operation (BFS, DFS, B+Tree walk).

Connection's can be timed out by using PRStatus ns_add_io_timeout_job, and the timeout is reset when they are re-added to the work queues. So this is pretty easy to achieve.

For the record, current with NS enabled, work is done in: {{{ (gdb) bt #0 do_search (pb=0x7f7cb0dca9f0) at /home/william/development/389ds/ds/ldap/servers/slapd/search.c:37 #1 0x000000000041f8b8 in connection_dispatch_operation (conn=0x7f7cb15f7200, op=0x61400005fc40, pb=0x7f7cb0dca9f0) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:651 #2 0x00000000004251ac in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1759 #3 0x00007f7cc46027df in _pt_root (arg=0x612000096040) at ../../../nspr/pr/src/pthreads/ptthread.c:216 #4 0x00007f7cc43c26ba in start_thread (arg=0x7f7cb0dcb700) at pthread_create.c:333 #5 0x00007f7cc40fd3cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 (gdb) cont }}} Rather than in the NS worker thread. So this would change as part of this ticket.

Metadata Update from @firstyear:
- Issue set to the milestone: 1.3.6 backlog

3 years ago

Metadata Update from @firstyear:
- Issue assigned to firstyear

3 years ago

Metadata Update from @firstyear:
- Issue close_status updated to: None
- Issue set to the milestone: 1.4 backlog (was: 1.3.6 backlog)

3 years ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to None
- Issue tagged with: RFE

2 years ago

Metadata Update from @mreynolds:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

4 months ago

Login to comment on this ticket.