PR#50394: Ticket 50389 - ns-slapd craches while two threads are polling the same connection - 389-ds-base

389-ds-base

#50394 Ticket 50389 - ns-slapd craches while two threads are polling the same connection

Closed 3 years ago by spichugi. Opened 4 years ago by tbordaz.

tbordaz/389-ds-base Ticket-50389 into master

Ticket 50389 - ns-slapd craches while two threads are polling the same connection

Thierry Bordaz • 4 years ago

2886ba7

ldap/servers/slapd/connection.c

file modified

+3 -1

		`@@ -931,7 +931,7 @@`
		`#define CONN_DONE 3`
		`#define CONN_TIMEDOUT 4`

		`- #define CONN_TURBO_TIMEOUT_INTERVAL 400 /* milliseconds */`
		`+ #define CONN_TURBO_TIMEOUT_INTERVAL 100 /* milliseconds */`
		`#define CONN_TURBO_TIMEOUT_MAXIMUM 5 /* attempts * interval IE 2000ms with 400 * 5 */`
		`#define CONN_TURBO_CHECK_INTERVAL 5 /* seconds */`
		`#define CONN_TURBO_PERCENTILE 50 /* proportion of threads allowed to be in turbo mode */`
		`@@ -1187,7 +1187,9 @@`
		`pr_pd.fd = (PRFileDesc *)conn->c_prfd;`
		`pr_pd.in_flags = PR_POLL_READ;`
		`pr_pd.out_flags = 0;`
		`+ PR_Lock(conn->c_pdumutex);`
		`ret = PR_Poll(&pr_pd, 1, timeout);`
		`+ PR_Unlock(conn->c_pdumutex);`
		`waits_done++;`
		`/* Did we time out ? */`
		`if (0 == ret) {`

ldap/servers/slapd/daemon.c

file modified

		`@@ -1940,6 +1940,8 @@`
		`* or something goes seriously wrong. Otherwise, return 0.`
		`* If -1 is returned, PR_GetError() explains why.`
		`* Revision: handle changed to void * to allow 64bit support`
		`+ *`
		`+ * Caller (flush_ber) must hold conn->c_pdumutex`
		`*/`
		`static int`
		`slapd_poll(void *handle, int output)`

tbordaz commented 4 years ago

Bug Description:
nspr IO is not multi-threaded safe.
389-ds should not be in a situation where several threads are polling
a same connection at the same time.
The scenario is a worker send back an operation result at the same time
another worker wants to read an incoming request.

Fix Description:
The fix consist in synchonizing polling with c_pdumutex.

The thread that sends data (flush_ber) hold c_pdumutex.

The thread that reads the data does a non blocking read. It then
enforce ioblocktimeout with iteration of poll.
The reading thread must hold c_pdumutex during poll to synchronize
with the reader thread.
The reading thread must poll with a small timeout
(CONN_TURBO_TIMEOUT_INTERVAL). In order to not block
the thread that send back data, the fix reduces the delay to 0.1s.

https://pagure.io/389-ds-base/issue/50389

Reviewed by: ?

Platforms tested: F28

Flag Day: no

Doc impact: no

mreynolds commented 4 years ago

LGTM!

mhonek commented 4 years ago

LGTM as well! But we should not forget to add a test case once we have a reliable reproducer -- I would keep the associated issue open or file a new one solely for the test case. Thanks.

firstyear commented 4 years ago

@mhonek It's hard to get reproducers for connection code :S especially where timing and races are involved :(

Anyway, ack from me too. Looks like you're popular @tbordaz :)

rebased onto 847b65e01ffe0f776e045e07a46ce494a253c260

4 years ago

tbordaz commented 4 years ago

@mreynolds , @mhonek , @firstyear thank you all for your review and ACK.
I think connection handling is regular popular component :) . I hope it will be less popular with nunc stans re-enabled by default !

It is possible there is a reproducer and waiting some feedback.
If there is no reproducer I will merge the PR.
If there is a reproducer the merge will be delayed to get the result.

firstyear commented 4 years ago

@tbordaz nunc-stans is certainly on my todo list, I would like that cleaned up and merged, but I remember there were some issues with replication I would like to investigate.

rebased onto 2886ba7

4 years ago

tbordaz commented 4 years ago

Finally there is no identified reproducer. Merging the PR. Thanks for the reviews

Pull-Request has been merged by tbordaz

4 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This pull request has been cloned to Github as issue and is available here:
- https://github.com/389ds/389-ds-base/issues/3452

If you want to continue to work on the PR, please navigate to the github issue,
download the patch from the attachments and file a new pull request.

Thank you for understanding. We apologize for all inconvenience.

Pull-Request has been closed by spichugi

3 years ago

Metadata

Assignee

None

Tags

No Tags

Changes Summary 2

+3 -1

file changed

ldap/servers/slapd/connection.c

+2 -0

file changed

ldap/servers/slapd/daemon.c

389-ds-base

Source Code

#50394 Ticket 50389 - ns-slapd craches while two threads are polling the same connection Closed 3 years ago by spichugi. Opened 4 years ago by tbordaz. tbordaz/389-ds-base Ticket-50389 into master

Metadata

Changes Summary 2

#50394 Ticket 50389 - ns-slapd craches while two threads are polling the same connection

Closed 3 years ago by spichugi. Opened 4 years ago by tbordaz.

tbordaz/389-ds-base Ticket-50389 into master