#48338 SimplePagedResults -- abandon could happen between the abandon check and sending results
Closed: wontfix None Opened 7 years ago by nhosoi.

Bug description:
Problematic code in op_shared_search (opshared.c):
703 pr_search_result = pagedresults_get_search_result(pb->pb_conn, operation, pr_idx);
704 if (pr_search_result) {
705 if (pagedresults_is_abandoned_or_notavailable(pb->pb_conn, pr_idx)) {
706 pagedresults_unlock(pb->pb_conn, pr_idx);
707 / Previous operation was abandoned and the simplepaged object is not in use. /
708 send_ldap_result(pb, 0, NULL, "Simple Paged Results Search abandoned", 0, NULL);
709 rc = LDAP_SUCCESS;
710 goto free_and_return;
711 } else {
712 slapi_pblock_set( pb, SLAPI_SEARCH_RESULT_SET, pr_search_result );
713 rc = send_results_ext (pb, 1, &pnentries, pagesize, &pr_stat);
715 / search result could be reset in the backend/dse /
716 slapi_pblock_get(pb, SLAPI_SEARCH_RESULT_SET, &sr);
717 pagedresults_set_search_result(pb->pb, operation, sr, 0, pr_idx);
718 }

The crash occurs in send_results_ext at line 713.
#5 0x00007f5e7711b6a7 in send_results_ext (pb=0x1402660, nentries=0x7f5e4bdef384, pagesize=1000,
pr_stat=0x7f5e4bdef378, send_result=1) at ldap/servers/slapd/opshared.c:1706
#6 0x00007f5e7711c3c0 in op_shared_search (pb=0x1402660, send_result=1) at ldap/servers/slapd/opshared.c:713

where the pr_search_result_set in the simple paged results handle in the connection is already freed (pr_serearch_result_set == 0x0) as well as pr_flags has CONN_FLAG_PAGEDRESULTS_ABANDONE (== 512) bit as follows:
(gdb) p *pb->pb_conn->c_pagedresults->prl_list
$8 = {pr_current_be = 0xe8b460, pr_search_result_set = 0x0, pr_search_result_count = 1000,
pr_search_result_set_size_estimate = 0, pr_sort_result_code = 0, pr_timelimit = 0, pr_flags = 640,
pr_msgid = 5, pr_mutex = 0x7f5db4b12aa0}

but pr_search_result retrieved at line 703 has non-NULL value:
(gdb) p pr_search_result
$9 = (void *) 0x7f5db5e222a0

Also, pagedresults_is_abandoned_or_notavailable does not return CONN_FLAG_PAGEDRESULTS_ABANDONE was set.

That being said, the abandon happened between the line 705 and 713.

Fix description:
The code (from line703 through 717) has to be protected.

Reviewed by Rich (Thank you!!)

Pushed to master:
83c98d9..390b8bd master -> master
commit 390b8bd

Pushed to 389-ds-base-1.3.4:
55434d3..8f49d33 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 8f49d33

Pushed to 389-ds-base-1.3.3:
a4db5ec..a7a8b2d 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
commit a7a8b2d

Pushed to 389-ds-base-1.2.11:
24a38f2..d5ccc06 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit d5ccc06

git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.patch

git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.patch (take 2)

git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults?-abandon-could-happen.patch (first phase of take 2)

Looks good, one minor indentation issue:


diff --git a/ldap/servers/slapd/opshared.c b/ldap/servers/slapd/opshared.c
index 8a14431..9f8773f 100644
680 slapi_pblock_set(pb, SLAPI_CONN_MUTEX_LOCKED, &locked);
681 locked = 0; / for setting unlocked /

Nathan and Mark, thanks for reviewing the patch [1].

I reworded the last part of the comment as follows based upon the Nathan's review.
Removing PR_Lock(c_mutex) from slapi_pblock_get(SLAPI_CONN_CLIENTNETADDR) since
acquiring the lock is not necessary for the atomic reads. This change solves
the self deadlock.
[1] 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.4.patch​.

Reviewed by Nathan and Mark (Thank you!!)

Pushed to master:
341c3c1..79ca67d master -> master
commit 79ca67d

Pushed to 389-ds-base-1.3.4:
c0d35f7..36245ab 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit 36245ab

Pushed to 389-ds-base-1.3.3:
7fca878..1d00345 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
commit 1d00345

Pushed to 389-ds-base-1.2.11:
d5ccc06..0e2a753 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 0e2a753

Note - we are seeing a self deadlock on conn mutex for getting the CONN_DN. Attaching the stack trace showing the self deadlock

Could this change be safe enough to make?
diff --git a/ldap/servers/slapd/pblock.c b/ldap/servers/slapd/pblock.c
index f2017be..dcb4b31 100644
--- a/ldap/servers/slapd/pblock.c
+++ b/ldap/servers/slapd/pblock.c
@@ -164,6 +164,7 @@ slapi_pblock_get( Slapi_PBlock pblock, int arg, void value )
((PRUint64 )value) = pblock->pb_conn->c_connid;
+ char cdn;

* NOTE: we have to make a copy of this that the caller
* is responsible for freeing. otherwise, they would get
@@ -174,10 +175,9 @@ slapi_pblock_get( Slapi_PBlock pblock, int arg, void value )
"Connection is NULL and hence cannot access SLAPI_CONN_DN \n", 0, 0, 0 );
return (-1);
- PR_Lock( pblock->pb_conn->c_mutex );
- (*(char )value) = (NULL == pblock->pb_conn->c_dn ? NULL :
- slapi_ch_strdup( pblock->pb_conn->c_dn ));
- PR_Unlock( pblock->pb_conn->c_mutex );
+ / For fields with atomic access, remove the PR_Lock(c_mutex) /
+ cdn = pblock->pb_conn->c_dn;
+ (*(char
)value) = (NULL == cdn ? NULL : slapi_ch_strdup(cdn));
I'm also worried about other fields in pb_conn such as c_authtype, cin_addr, cin_destaddr, etc...

git patch file (master) -- Revert 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.4.patch

Opened a new ticket for the self deadlock.
Ticket #48406 - Avoid self deadlock by PR_Lock(conn->c_mutex)

Reverted 0001-Ticket-48338-SimplePagedResults?-abandon-could-happen.4.patch

Pushed to master:
dd90d19..f25f804 master -> master
commit 1818478

Pushed to 389-ds-base-1.3.4:
b31b676..84da7d0 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit c8b1817

Pushed to 389-ds-base-1.3.3:
e62ef7f..ad08d1f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3
commit 61c3ff6

Pushed to 389-ds-base-1.2.11:
44413ce..f69d19a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit bab6ec6

Metadata Update from @nhosoi:
- Issue set to the milestone:

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1669

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

2 years ago

Login to comment on this ticket.