Issue #48894: Issues with high numbers of entry state information values on objects during mod_delete - 389-ds-base

389-ds-base

#48894 Issues with high numbers of entry state information values on objects during mod_delete

Closed: wontfix 6 years ago Opened 7 years ago by firstyear.

In a group with a high level of churn, IE 30,000 members added then deleted, 5000 members remain. This group performs poorly for any mod delete of a single member after this.

It was reported that purge_entry_state_information() leaves a_deleted_values unsorted which means that during the delete operation it is re-sorted while the database lock is held.

This issue is avoided in the ADD case by a different algorithm.

The impact is on an object such as a group which has man entry state information attributes. Hangs of up to 30+ seconds have been observed.

We should create a test case to simulate this, followed up by improving the performance of attribute delete.

The set of values is sorted with valueset_array_to_sorted(), which is only called by slapi_valueset_add_attr_valuearray_ext(). slapi_valueset_add_attr_valuearray_ext() is used in a number of locations, mainly in this case entrywsi.c.

firstyear commented 7 years ago

attachment
0001-Ticket-48894-Issue-with-high-number-of-entry-state-o.patch

lkrispen commented 7 years ago

I think this patch tries to solve the problem in the wrong place. As stated in the ticket description it does not occur for add operations as a different algorithm is used.
In the implementation of the sorted valueset the assumption was that as soon as the numebr of values passes the threshold it would be sorted - and for arrays of ~10 elements quicksort seemed to be some overhead.
Could we first investigate why the behaviour in ŧhe del is different and if it could be made"sort friendly" as the add

nhosoi commented 7 years ago

Sorry, it was me to ask William to replace the current insertion sort with the quick sort instead of adding on top of it as in the original proposal...

Ludwig, do you think we should go back to the original proposal, which is ... if the value count is 10 through 100, use the insertion sort and if it is larger than 100, use quicksort?

According to the benchmark, it seems the quicksort is beneficial for the large group entries.

firstyear commented 7 years ago

The patch was provided by someone else, I just cleaned it up and improved it. It does fix their issue apparently, and I'm not 100% familiar with this part of the code to comment largely. I believe some of their issue is when they are adding large number of attributes to new objects.

I still agree that we should look at other parts of the code too, and I think perhaps this is part of our code that could benefit from some examination and maybe a better way to store values?

lkrispen commented 7 years ago

it is ok for me to replace the sorting algorithm if it gives immediate relief, but I still would like to understand the real cause of the problem. so we should not just close this ticket or create an other one for the core issue

the indentation in valueset_replace_valuearray_ext() looks now a bit zigzagged, at least in the patch view

firstyear commented 7 years ago

Right, I'll leave this issue open then.

I have fixed the whitespace issue, but it was there before I started. Given it's just white space, the other content of the patch is the same, I'll assume your ack extends to that.

firstyear commented 7 years ago

commit 88b9b9e
Writing objects: 100% (8/8), 2.44 KiB | 0 bytes/s, done.
Total 8 (delta 6), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
3c537b0..88b9b9e master -> master

firstyear commented 7 years ago

attachment
0001-Ticket-48894-Issues-with-delete-of-entrywsi-with-lar.patch

firstyear commented 7 years ago

This patch should finish correcting the issue. The ultimate issue was the removal of vs->sorted during valueset_purge. This fix prevents that occuring. I have tested it with the basic, dynamic and replication suites, and it is passing.

mreynolds commented 7 years ago

Indentation is off:

{{{
valueset_array_purge()
...
753 if (vs->va[j]->v_csnset == NULL) {
754 slapi_value_free(&vs->va[j]);
755 vs->va[j] = NULL;
756 }
}}}

Otherwise, ack.

firstyear commented 7 years ago

Fixed the white space issue!

commit d92394c
Writing objects: 100% (77/77), 10.81 KiB | 0 bytes/s, done.
Total 77 (delta 65), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
86bffc8..b1f434e master -> master

mreynolds commented 7 years ago

These patches are crashing the directory server in the dynamic plugins test suite(the standalone instance is the one that is crashing) {{{ #0 0x00007f65f0645ddf in valueset_value_cmp (a=0x7f6500098900, v1=0x0, v2=0x7f6500000078) at ../ds/ldap/servers/slapd/valueset.c:931 #1 0x00007f65f0646140 in valueset_array_to_sorted_quick (a=0x7f6500098900, vs=0x7f6500098938, low=0, high=11) at ../ds/ldap/servers/slapd/valueset.c:1017 #2 0x00007f65f0645fe7 in valueset_array_to_sorted (a=0x7f6500098900, vs=0x7f6500098938) at ../ds/ldap/servers/slapd/valueset.c:990 #3 0x00007f65f0646656 in slapi_valueset_add_attr_valuearray_ext (a=0x7f6500098900, vs=0x7f6500098938, addvals=0x7f65000d42b0, naddvals=1, flags=1, dup_index=0x0) at ../ds/ldap/servers/slapd/valueset.c:1137 #4 0x00007f65f05b259a in entry_delete_present_values_wsi_multi_valued (e=0x7f6500098240, type=0x7f65000938b0 "uniqueMember", vals=0x7f6500029e80, csn=0x7f6539ff9a60, urp=0, mod_op=129, replacevals=0x0) at ../ds/ldap/servers/slapd/entrywsi.c:877 #5 0x00007f65f05b1e0b in entry_delete_present_values_wsi (e=0x7f6500098240, type=0x7f65000938b0 "uniqueMember", vals=0x7f6500029e80, csn=0x7f6539ff9a60, urp=0, mod_op=129, replacevals=0x0) at ../ds/ldap/servers/slapd/entrywsi.c:643 #6 0x00007f65f05b2790 in entry_apply_mod_wsi (e=0x7f6500098240, mod=0x7f650005efd0, csn=0x7f6539ff9a60, urp=0) at ../ds/ldap/servers/slapd/entrywsi.c:948 #7 0x00007f65f05b2a48 in entry_apply_mods_wsi (e=0x7f6500098240, smods=0x7f6539ff9ae0, csn=0x7f650003e770, urp=0) at ../ds/ldap/servers/slapd/entrywsi.c:1024 #8 0x00007f65e33c1a06 in modify_apply_check_expand (pb=0x7f650005abc0, operation=0x7f650003a0c0, mods=0x7f6500048500, e=0x7f64f4081750, ec=0x7f65000981d0, postentry=0x7f6539ff9bd8, ldap_result_code=0x7f6539ff9b5c, ldap_result_message=0x7f6539ff9bf8) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:247 #9 0x00007f65e33c28d5 in ldbm_back_modify (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_modify.c:639 #10 0x00007f65f05e4152 in op_shared_modify (pb=0x7f650005abc0, pw_change=0, old_pw=0x0) at ../ds/ldap/servers/slapd/modify.c:1055 #11 0x00007f65f05e2d53 in modify_internal_pb (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/modify.c:599 #12 0x00007f65f05e28cf in slapi_modify_internal_pb (pb=0x7f650005abc0) at ../ds/ldap/servers/slapd/modify.c:454 #13 0x00007f65e4e70633 in _do_modify (mod_pb=0x7f650005abc0, entrySDN=0x7f650003b880, mods=0x7f6539ffa1f0) at ../ds/ldap/servers/plugins/referint/referint.c:752 #14 0x00007f65e4e70752 in _update_one_per_mod (entrySDN=0x7f650003b880, attr=0x7f65000318e0, attrName=0x7f65000239d0 "uniqueMember", origDN=0x7f6500034400, newRDN=0x0, newsuperior=0x0, mod_pb=0x7f650005abc0) at ../ds/ldap/servers/plugins/referint/referint.c:793 #15 0x00007f65e4e71642 in update_integrity (origSDN=0x7f6500034400, newrDN=0x0, newsuperior=0x0, logChanges=0) at ../ds/ldap/servers/plugins/referint/referint.c:1194 #16 0x00007f65e4e702ec in referint_postop_del (pb=0x7f6539ffaa80) at ../ds/ldap/servers/plugins/referint/referint.c:641 #17 0x00007f65f05fb6dc in plugin_call_func (list=0x1a5d2b0, operation=563, pb=0x7f6539ffaa80, call_one=0) at ../ds/ldap/servers/slapd/plugin.c:2065 #18 0x00007f65f05fb545 in plugin_call_list (list=0x1d8e510, operation=563, pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/plugin.c:2007 #19 0x00007f65f05f7885 in plugin_call_plugins (pb=0x7f6539ffaa80, whichfunction=563) at ../ds/ldap/servers/slapd/plugin.c:446 #20 0x00007f65e33b31c6 in ldbm_back_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/back-ldbm/ldbm_delete.c:1214 #21 0x00007f65f059afc3 in op_shared_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/delete.c:331 #22 0x00007f65f059a7a5 in do_delete (pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/delete.c:97 #23 0x0000000000417283 in connection_dispatch_operation (conn=0x1a07b78, op=0x7f65340078c0, pb=0x7f6539ffaa80) at ../ds/ldap/servers/slapd/connection.c:608 #24 0x0000000000419322 in connection_threadmain () at ../ds/ldap/servers/slapd/connection.c:1757 #25 0x00007f65edf375cc in _pt_root () from /lib64/libnspr4.so #26 0x00007f65ed8d65ca in start_thread () from /lib64/libpthread.so.0 #27 0x00007f65ed6100ed in clone () from /lib64/libc.so.6 }}} Please investigate.

firstyear commented 7 years ago

This corrects the server crash.
0001-Ticket-48894-improve-entrywsi-delete.patch

firstyear commented 7 years ago

commit a3e270e
Writing objects: 100% (25/25), 21.42 KiB | 0 bytes/s, done.
Total 25 (delta 19), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git

eb358a5..848bd6b master -> master

Metadata Update from @firstyear:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.6 backlog

7 years ago

mreynolds commented 6 years ago

We are getting crashes reported by the community related to this change - specifically valueset_array_to_sorted_quick():

Thread 50 "ns-slapd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f95860a4700 (LWP 1274)]
0x00007f9702024f91 in valueset_array_to_sorted_quick (a=0xa7c93897a0, vs=vs@entry=0xa7c93897d8, low=low@entry=0, high=111) at ldap/servers/slapd/valueset.c:993
993 } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

Gathering more information, but unfortunately there is no reproducer at this time.

Metadata Update from @mreynolds:
- Issue set to the milestone: None (was: 1.3.6 backlog)
- Issue status updated to: Open (was: Closed)

6 years ago

mreynolds commented 6 years ago

@nweiderm - fyi there is an issue with this fix

Here is the full stacktrace:

Thread 1 (Thread 0x7f4961c43700 (LWP 1512)):
#0  0x00007f4adcbc1f91 in valueset_array_to_sorted_quick (a=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, low=low@entry=0, high=759) at ldap/servers/slapd/valueset.c:993
        pivot = 0
        i = <optimized out>
        j = 760
#1  0x00007f4adcbc2082 in valueset_array_to_sorted (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78) at ldap/servers/slapd/valueset.c:970
        i = <optimized out>
#2  0x00007f4adcbc21f9 in valueset_array_purge (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, csn=csn@entry=0xae4c3b4fd0) at ldap/servers/slapd/valueset.c:771
        i = <optimized out>
        j = <optimized out>
        nextValue = <optimized out>
        numValues = <optimized out>
#3  0x00007f4adcbc2288 in valueset_purge (a=a@entry=0xae44b5ab40, vs=vs@entry=0xae44b5ab78, csn=csn@entry=0xae4c3b4fd0) at ldap/servers/slapd/valueset.c:797
        r = 0
#4  0x00007f4adcb2e2d2 in attr_purge_state_information (entry=entry@entry=0xae4c4dfd40, attr=attr@entry=0xae44b5ab40, csnUpTo=csnUpTo@entry=0xae4c3b4fd0) at ldap/servers/slapd/attr.c:739
#5  0x00007f4adcb4c216 in entry_purge_state_information (e=0xae4c4dfd40, csnUpTo=0xae4c3b4fd0) at ldap/servers/slapd/entrywsi.c:284
        a = 0xae44b5ab40
#6  0x00007f4ad05d2b74 in purge_entry_state_information (pb=pb@entry=0xae4cc725a0) at ldap/servers/plugins/replication/repl5_plugins.c:568
        e = 0xae4c4dfd40
        optype = 8
        purge_csn = 0xae4c3b4fd0
        repl_obj = 0xae44bb73e0
        replica = <optimized out>
#7  0x00007f4ad05d46c2 in multimaster_bepreop_modify (pb=0xae4cc725a0) at ldap/servers/plugins/replication/repl5_plugins.c:704
        rc = 0
        op = 0xae450bee00
        is_replicated_operation = <optimized out>
        is_fixup_operation = <optimized out>
#8  0x00007f4adcb8bfad in plugin_call_mmr_plugin_preop (pb=pb@entry=0xae4cc725a0, e=e@entry=0x0, flags=flags@entry=451) at ldap/servers/slapd/plugin_mmr.c:39
        p = 0xae44aaa840
        rc = 50
        operation = 0xae450bee00
#9  0x00007f4ad0ec53f5 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:626
        cache_rc = 0
        new_mod_count = 0
        be = 0xae449932b0
        inst = 0xae449344e0
        li = 0xae448b7180
        e = 0xae4cc737a0
        ec = 0xae4cc73740
        original_entry = 0x0
        tmpentry = 0x0
        postentry = 0x0
        mods = 0xae4c9e4690
        mods_original = 0x0
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        txn = {back_txn_txn = 0xae4cc48b00}
        parent_txn = 0x0
        ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, attr_encrypt = 0}
        ruv_c_init = 0
        retval = 0
        msg = <optimized out>
        errbuf = 0x0
        retry_count = 0
        disk_full = 0
        ldap_result_code = 0
        ldap_result_message = 0x0
        rc = 0
        operation = 0xae450bee00
        addr = 0xae450beef0
        is_fixup_operation = 0
        is_ruv = 0
        opcsn = <optimized out>
        repl_op = 8
        opreturn = 0
        mod_count = <optimized out>
        not_an_error = 0
        is_noop = 0
        fixup_tombstone = 0
        ec_locked = 0
        result_sent = 0
#10 0x00007f4adcb74add in op_shared_modify (pb=pb@entry=0xae4cc725a0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1013
        rc = 0
        be = 0xae449932b0
        pse = 0x1
        referral = 0x0
        e = 0x0
        dn = 0xae44a66780 "cn=refreshstatus,ou=people,dc=eds,dc=arizona,dc=edu"
        normdn = <optimized out>
        sdn = 0xae4c9e5140
        passin_sdn = 0
        mods = 0xae4c9e4690
        pw_mod = <optimized out>
        tmpmods = 0xae4c9e4690
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        repl_op = 8
        internal_op = 0
        lastmod = 1
        skip_modified_attrs = 0
        unhashed_pw_attr = 0x0
        operation = 0xae450bee00
        pb_conn = 0xae4cdf9c00
        errorbuf = "\000C\321\333J\177\000\000\200\020\006D\256\000\000\000\340C\321\333J\177\000\000\b\000\000\000\000\000\000\000@\020\006D\256\000\000\000\267\205\226\333J\177\000\000P4:L\256\000\000\000\240\064:L\256\000\000\000\240\064:L\256\000\000\000\000\063\275\200\313v)v\000'\304aI\177\000\000\000\063\275\200\313v)v\020\000\000\000\000\000\000\000\200\020\006D\256\000\000\000\020\000\000\000\000\000\000\000 \000\000\000\000\000\000\000@\020\006D\256", '\000' <repeats 11 times>, "Xg\246D\256\000\000\000\332\263\226\333J\177\000\000\300O\261L\256\000\000\000\320O\261L\256\000\000\000\260K\261L\256\000\000\000\000\063\275\200\313v)v@\020\006D\256\000\000\000"...
        err = <optimized out>
        lc_mod = <optimized out>
        p = <optimized out>
        proxydn = 0x0
        proxy_err = <optimized out>
        errtext = 0x0
#11 0x00007f4adcb75ee2 in do_modify (pb=pb@entry=0xae4cc725a0) at ldap/servers/slapd/modify.c:383
        operation = 0xae450bee00
        pb_conn = 0xae4cdf9c00
        smods = {mods = 0x0, num_elements = 0, num_mods = 0, iterator = 0, free_mods = 0}
        ber = <optimized out>
        tag = <optimized out>
        len = 18446744073709551615
        normalized_mods = 0xae4c9e4690
        mod = 0x0
        mods = 0xae44a66740
        last = 0xae4c94826f "\240\177\060\033\004\027\062.16.840.1.113730.3.4.2\004"
        type = 0x0
        old_pw = 0x0
        rawdn = 0xae44a66780 "cn=refreshstatus,ou=people,dc=eds,dc=arizona,dc=edu"
        minssf_exclude_rootdse = <optimized out>
        ignored_some_mods = <optimized out>
        has_password_mod = <optimized out>
        pw_change = <optimized out>
        err = <optimized out>
#12 0x000000ae42dda886 in connection_dispatch_operation (pb=0xae4cc725a0, op=0xae450bee00, conn=0xae4cdf9c00) at ldap/servers/slapd/connection.c:624
        minssf = 0
        minssf_exclude_rootdse = <optimized out>
        enable_nagle = <optimized out>
        pop_cork = 0
        is_timedout = 0
        curtime = <optimized out>
        pb = 0xae4cc725a0
        conn = 0xae4cdf9c00
        op = 0xae450bee00
        tag = 102
        need_wakeup = 0
        thread_turbo_flag = <optimized out>
        ret = <optimized out>
        more_data = 0
        replication_connection = <optimized out>
        doshutdown = 0
        maxthreads = 5
        bypasspollcnt = <optimized out>
        pb_conn = 0xae4cdf9c00
        pb_op = 0xae450bee00
#13 0x000000ae42dda886 in connection_threadmain () at ldap/servers/slapd/connection.c:1761
        is_timedout = 0
        curtime = <optimized out>
        pb = 0xae4cc725a0
        conn = 0xae4cdf9c00
        op = 0xae450bee00
        tag = 102
        need_wakeup = 0
        thread_turbo_flag = <optimized out>
        ret = <optimized out>
        more_data = 0
        replication_connection = <optimized out>
        doshutdown = 0
        maxthreads = 5
        bypasspollcnt = <optimized out>
        pb_conn = 0xae4cdf9c00
        pb_op = 0xae450bee00
#14 0x00007f4ada74208b in _pt_root () at /lib64/libnspr4.so
#15 0x00007f4ada0de36d in start_thread () at /lib64/libpthread.so.0
#16 0x00007f4ad9bcdbbf in clone () at /lib64/libc.so.6

This is where we crash;

0x00007f9702024f91 in valueset_array_to_sorted_quick (a=0xa7c93897a0, vs=vs @entry=0xa7c93897d8, low=low @entry=0, high=111) at ldap/servers/slapd/valueset.c:993
993 } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

(gdb) p pivot
$2 = 0
(gdb) p i
$1 = <optimized out
(gdb) p *vs
$6 = {num = 760, max = 1024, sorted = 0xae4d2da000, va = 0xae4d2d8000}
(gdb) p vs->sorted[i]
value has been optimized out
(gdb) p vs->va[vs->sorted[i]]
value has been optimized out
(gdb) p *vs->va
$7 = (struct slapi_value *) 0x0
(gdb) p vs->va[pivot]
$4 = (struct slapi_value *) 0x0
(gdb) p *vs->va[pivot]
Cannot access memory at address 0x0

Looks like we we crash because we dereference a NULL pointer: vs->va[vs->sorted[i]],

Currently there is no reproducer but I'm going to try and work on one.

Edited 6 years ago by mreynolds

nweiderm commented 6 years ago

@mreynolds so, when I was studying the purging algo in 1.3 (in the hope of backporting it to our running version) I caught a bug (and it might be the bug that you're hitting here), that's why I decided to work on that new patch instead.

The bug is in the compact algorithm used in valueset_array_purge function in this commit: https://pagure.io/389-ds-base/c/a3e270e483b52f212c638b1d9de8bf862d2a3f49.

j inside the while loop should start iterating from nv+1 not 0, the bug will trigger when you have the following va:

[a, b, NULL, d, NULL...]

On first iteration nextValue and j will be on index 0, nothing replaced, nextValue increased by 1

[a, b, NULL, d, NULL...]

On second iteration nextValue is on index 1 which is (b) not null, but j is still on index 0, nothing replaced, nextValue increased by 1

[a, b, NULL, d, NULL...]

third iteration nextValue is on index 2 which is null, and j is still on index 0, in this iteration the NULL in index 2 will be replaced with a and a will be nullified, nextValue will be increased by 1 and j will remain on index 0.

[NULL, b, a, d, NULL...]

now nextValue is on index 3 which is not null, nothing replaced, nextValue increased by 1

[NULL, b, a, d, NULL...]

nextValue is on index 4 which is null, j will be increased by 1 to be on index 1, NULL on index 4 will be replaced with b, and b will be nullifed

[NULL, NULL, a, d, b...]

So, in that specific case, we will end up with an invalid va array containing invalid and NULL values inside of it.

Edited 6 years ago by nweiderm

mreynolds commented 6 years ago

Community member is going to test your patch, but they got a better stack trace, we can see that the 1024 limit is hit, not sure it's related or not:

Thread 1 (Thread 0x7f1e7fa12700 (LWP 1837)):
#0  0x00007f1ff9a231c6 in valueset_array_to_sorted_quick (a=0x796c8b25a0, vs=0x796c8b25d8, low=0, high=759) at ldap/servers/slapd/valueset.c:993
        pivot = 0
        i = 1024
        j = 760
#1  0x00007f1ff9a2312b in valueset_array_to_sorted (a=0x796c8b25a0, vs=0x796c8b25d8) at ldap/servers/slapd/valueset.c:970
        i = 1024
#2  0x00007f1ff9a22b5b in valueset_array_purge (a=0x796c8b25a0, vs=0x796c8b25d8, csn=0x796c8de420) at ldap/servers/slapd/valueset.c:771
        i = 761
        j = 1024
        nextValue = 760
        numValues = 760
#3  0x00007f1ff9a22ba6 in valueset_purge (a=0x796c8b25a0, vs=0x796c8b25d8, csn=0x796c8de420) at ldap/servers/slapd/valueset.c:797
        r = 0

mreynolds commented 6 years ago

FYI, the 1024 limit I am referring to is the valueset max, so I was wondering if we were running past the max size?

(gdb) p i
$5 = 1024

(gdb) p *vs
$6 = {num = 760, max = 1024, sorted = 0xae4d2da000, va = 0xae4d2d8000}

(gdb) p vs->va[vs->sorted[i]]
Cannot access memory at address 0x44519e10020

Edited 6 years ago by mreynolds

mreynolds commented 6 years ago

In fact I wonder if we need this change:

diff --git a/ldap/servers/slapd/valueset.c b/ldap/servers/slapd/valueset.c
index dc0360738..5a16b03dc 100644
--- a/ldap/servers/slapd/valueset.c
+++ b/ldap/servers/slapd/valueset.c
@@ -1019,7 +1019,7 @@ valueset_array_to_sorted_quick(const Slapi_Attr *a, Slapi_ValueSet *vs, size_t l
     while (1) {
         do {
             i++;
-        } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);
+        } while (i < vs->max && valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

         do {
             j--;

nweiderm commented 6 years ago

In fact I wonder if we need this change:
diff --git a/ldap/servers/slapd/valueset.c b/ldap/servers/slapd/valueset.c
index dc03607..5a16b03dc 100644
--- a/ldap/servers/slapd/valueset.c
+++ b/ldap/servers/slapd/valueset.c
@@ -1019,7 +1019,7 @@ valueset_array_to_sorted_quick(const Slapi_Attr a, Slapi_ValueSet vs, size_t l
while (1) {
do {
i++;
- } while (valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);
+ } while (i < vs->max && valueset_value_cmp(a, vs->va[vs->sorted[i]], vs->va[pivot]) < 0);

do {
j--;

Seems about right, i and j can go out of bound here, maybe you want to add another check for j > 0

mreynolds commented 6 years ago

Customer reported that your second patch seems to have resolved the crash, but lets give it a day or two. If it still crashes we can look into this code change.

nweiderm commented 6 years ago

Well, the latest patch resolved his issue because it preserves vs->sorted, and reduces the calls to valueset_array_to_sorted during the purge, so it's less likely to hit that crash case during the purge now, but based on the stack traces and the gdb data, it's a different bug that was causing the crash for him (it's not the one that I described above), your assumption of hitting the 1024 limit seems plausible to me, so under a similar edge case if valueset_array_to_sorted were called by different code path, i can still potentially hit the boundaries and j can potentially underflow, so I would suggest adding i < vs->num and j > 0 checks anyways

Edited 6 years ago by nweiderm

mreynolds commented 6 years ago

Well, the latest patch resolved his issue because it preserves vs->sorted, and reduces the calls to valueset_array_to_sorted during the purge, so it's less likely to hit that crash case during the purge now, but based on the stack traces and the gdb data, it's a different bug that was causing the crash for him (it's not the one that I described above), your assumption of hitting the 1024 limit seems plausible to me, so under a similar edge case if valueset_array_to_sorted were called by different code path, i can still potentially hit the boundaries and j can potentially underflow, so I would suggest adding i < vs->num and j > 0 checks anyways

Funny I thought more about this as well I was a planning on adding them. Well we are in agreement then :-)

Can you review this patch:

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to review (was: ack)

6 years ago

nweiderm commented 6 years ago

@mreynolds: yep, we're in agreement. Thinking about the check for j again, ~~I think you're right the check should be j >= 0, but we will need another check in do's block like:~~

do {
if (j > 0)
j--;
} while ( j >= 0 && valueset_value_cmp(a, vs->va[vs->sorted[j]], vs->va[pivot]) > 0 )

with only j >= 0 check, when j = 0 the condition will evaluate to true and j will be decremented to -1 ( assuming size_t = unsigned something) so it will underflow and the loop won't break because j will have a value larger than 0 again, right?

scratch that, the loop won't break with that solution too. maybe we just have to break if j == 0 inside the do block, the other option is to have j = high + 2 and then we use while ( j > 0 && valueset_value_cmp(a, vs->va[vs->sorted[j-1]], vs->va[pivot]) > 0 )

Edited 6 years ago by nweiderm

mreynolds commented 6 years ago

scratch that, the loop won't break with that solution too. maybe we just have to break if j == 0 inside the do block,

But don't we want to check vs->sorted[0] ?

the other option is to have j = high + 2 and then we use while ( j > 0 && valueset_value_cmp(a, vs->va[vs->sorted[j-1]], vs->va[pivot]) > 0 )

But further down we break if i >= j - this logic would change.

Perhaps in the do block we should do:

j--;
if (j < 0) {
    j = 0;
    break;
}

nweiderm commented 6 years ago

when j = 0 inside the do block and before the reduction, then vs->sorted[0] has already been checked the last iteration, in do while the reduction/decrement happens before the condition is checked, right? so in the first option when j = 0 and is about to be decremented to -1 in do block, this means the previous condition was for j = 0, for the second option, when j = 1 the condition will check vs->sorted[j-1] (ie: vs->sorted[0]), and then in the next iteration j will be decremented to 0 and the condition will evaluate to false and break, am I misunderstanding anything?

Edited 6 years ago by nweiderm

mreynolds commented 6 years ago

This patch should be correct now. I just moved the j > 0 after we do the compare inside the while.

nweiderm commented 6 years ago

Yep, that looks good to me :)

mreynolds commented 6 years ago

52ba2ab..2086d05 master -> master

4ac6ade..a85e247 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

2111276..4248346 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: review)
- Issue close_status updated to: fixed
- Issue set to the milestone: 1.3.6.0
- Issue status updated to: Closed (was: Open)

6 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1953

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Metadata

Assignee

firstyear

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

1.3.6.0

reviewstatus

ack

rhbz

None

origin

Community

Attachments 2

0001-Ticket-48894-harden-valueset_array_to_sorted_...

Attached 6 years ago View Comment

0001-Ticket-48894-harden-valueset_array_to_sorted_...

Attached 6 years ago View Comment

389-ds-base

Source Code

#48894 Issues with high numbers of entry state information values on objects during mod_delete Closed: wontfix 6 years ago Opened 7 years ago by firstyear.

Metadata

Attachments 2

#48894 Issues with high numbers of entry state information values on objects during mod_delete

Closed: wontfix 6 years ago Opened 7 years ago by firstyear.