Bug description: Problematic code in op_shared_search (opshared.c): 703 pr_search_result = pagedresults_get_search_result(pb->pb_conn, operation, pr_idx); 704 if (pr_search_result) { 705 if (pagedresults_is_abandoned_or_notavailable(pb->pb_conn, pr_idx)) { 706 pagedresults_unlock(pb->pb_conn, pr_idx); 707 / Previous operation was abandoned and the simplepaged object is not in use. / 708 send_ldap_result(pb, 0, NULL, "Simple Paged Results Search abandoned", 0, NULL); 709 rc = LDAP_SUCCESS; 710 goto free_and_return; 711 } else { 712 slapi_pblock_set( pb, SLAPI_SEARCH_RESULT_SET, pr_search_result ); 713 rc = send_results_ext (pb, 1, &pnentries, pagesize, &pr_stat); 714 715 / search result could be reset in the backend/dse / 716 slapi_pblock_get(pb, SLAPI_SEARCH_RESULT_SET, &sr); 717 pagedresults_set_search_result(pb->pb, operation, sr, 0, pr_idx); 718 }
The crash occurs in send_results_ext at line 713. #5 0x00007f5e7711b6a7 in send_results_ext (pb=0x1402660, nentries=0x7f5e4bdef384, pagesize=1000, pr_stat=0x7f5e4bdef378, send_result=1) at ldap/servers/slapd/opshared.c:1706 #6 0x00007f5e7711c3c0 in op_shared_search (pb=0x1402660, send_result=1) at ldap/servers/slapd/opshared.c:713
where the pr_search_result_set in the simple paged results handle in the connection is already freed (pr_serearch_result_set == 0x0) as well as pr_flags has CONN_FLAG_PAGEDRESULTS_ABANDONE (== 512) bit as follows: (gdb) p *pb->pb_conn->c_pagedresults->prl_list $8 = {pr_current_be = 0xe8b460, pr_search_result_set = 0x0, pr_search_result_count = 1000, pr_search_result_set_size_estimate = 0, pr_sort_result_code = 0, pr_timelimit = 0, pr_flags = 640, pr_msgid = 5, pr_mutex = 0x7f5db4b12aa0}
but pr_search_result retrieved at line 703 has non-NULL value: (gdb) p pr_search_result $9 = (void *) 0x7f5db5e222a0
Also, pagedresults_is_abandoned_or_notavailable does not return CONN_FLAG_PAGEDRESULTS_ABANDONE was set.
That being said, the abandon happened between the line 705 and 713.
Fix description: The code (from line703 through 717) has to be protected.
git patch file (master) 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.patch
Reviewed by Rich (Thank you!!)
Pushed to master: 83c98d9..390b8bd master -> master commit 390b8bd
Pushed to 389-ds-base-1.3.4: 55434d3..8f49d33 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 8f49d33
Pushed to 389-ds-base-1.3.3: a4db5ec..a7a8b2d 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit a7a8b2d
Pushed to 389-ds-base-1.2.11: 24a38f2..d5ccc06 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit d5ccc06
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1278567
git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.patch 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.2.patch
git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.patch (take 2) 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.3.patch
git patch file (master) -- fixing a self deadlock introduced by the previous patch 0001-Ticket-48338-SimplePagedResults?-abandon-could-happen.patch (first phase of take 2) 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.4.patch
Looks good, one minor indentation issue:
ldap/servers/slapd/opshared.c
{{{ diff --git a/ldap/servers/slapd/opshared.c b/ldap/servers/slapd/opshared.c index 8a14431..9f8773f 100644 680 slapi_pblock_set(pb, SLAPI_CONN_MUTEX_LOCKED, &locked); 681 locked = 0; / for setting unlocked / }}}
Nathan and Mark, thanks for reviewing the patch [1].
I reworded the last part of the comment as follows based upon the Nathan's review. {{{ Removing PR_Lock(c_mutex) from slapi_pblock_get(SLAPI_CONN_CLIENTNETADDR) since acquiring the lock is not necessary for the atomic reads. This change solves the self deadlock. }}} [1] 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.4.patch​.
Reviewed by Nathan and Mark (Thank you!!)
Pushed to master: 341c3c1..79ca67d master -> master commit 79ca67d
Pushed to 389-ds-base-1.3.4: c0d35f7..36245ab 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit 36245ab
Pushed to 389-ds-base-1.3.3: 7fca878..1d00345 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 1d00345
Pushed to 389-ds-base-1.2.11: d5ccc06..0e2a753 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 0e2a753
For the phase 2, opened a new ticket: https://fedorahosted.org/389/ticket/48349
Note - we are seeing a self deadlock on conn mutex for getting the CONN_DN. Attaching the stack trace showing the self deadlock
deadlock in from acl code deadlock.txt
Could this change be safe enough to make? {{{ diff --git a/ldap/servers/slapd/pblock.c b/ldap/servers/slapd/pblock.c index f2017be..dcb4b31 100644 --- a/ldap/servers/slapd/pblock.c +++ b/ldap/servers/slapd/pblock.c @@ -164,6 +164,7 @@ slapi_pblock_get( Slapi_PBlock pblock, int arg, void value ) ((PRUint64 )value) = pblock->pb_conn->c_connid; break; case SLAPI_CONN_DN: + char cdn; / * NOTE: we have to make a copy of this that the caller * is responsible for freeing. otherwise, they would get @@ -174,10 +175,9 @@ slapi_pblock_get( Slapi_PBlock pblock, int arg, void value ) "Connection is NULL and hence cannot access SLAPI_CONN_DN \n", 0, 0, 0 ); return (-1); } - PR_Lock( pblock->pb_conn->c_mutex ); - (*(char )value) = (NULL == pblock->pb_conn->c_dn ? NULL : - slapi_ch_strdup( pblock->pb_conn->c_dn )); - PR_Unlock( pblock->pb_conn->c_mutex ); + / For fields with atomic access, remove the PR_Lock(c_mutex) / + cdn = pblock->pb_conn->c_dn; + (*(char )value) = (NULL == cdn ? NULL : slapi_ch_strdup(cdn)); break; }}} I'm also worried about other fields in pb_conn such as c_authtype, cin_addr, cin_destaddr, etc...
git patch file (master) -- Revert 0001-Ticket-48338-SimplePagedResults-abandon-could-happen.4.patch 0001-Revert-Ticket-48338-SimplePagedResults-abandon-could.patch
Opened a new ticket for the self deadlock. Ticket #48406 - Avoid self deadlock by PR_Lock(conn->c_mutex)
Reverted 0001-Ticket-48338-SimplePagedResults?-abandon-could-happen.4.patch
Pushed to master: dd90d19..f25f804 master -> master commit 1818478
Pushed to 389-ds-base-1.3.4: b31b676..84da7d0 389-ds-base-1.3.4 -> 389-ds-base-1.3.4 commit c8b1817
Pushed to 389-ds-base-1.3.3: e62ef7f..ad08d1f 389-ds-base-1.3.3 -> 389-ds-base-1.3.3 commit 61c3ff6
Pushed to 389-ds-base-1.2.11: 44413ce..f69d19a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit bab6ec6
Metadata Update from @nhosoi: - Issue set to the milestone: 1.2.11.33
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/1669
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Log in to comment on this ticket.