#49401 Improve valueset sort performance during valueset purging
Closed: wontfix 6 years ago Opened 6 years ago by firstyear.

Issue Description

valueset sorted maintains a list of syntax sorted references to the attributes of the entry. During addition these are modified and added properly, so they stay sorted.

However, in the past to maintain the sorted property, during a delete we would need to remove the vs->sorted array, and recreate it via qsort,

While this was an improvement from past (where we would removed vs->sorted during an attr delete), it still has performance implications on very very large datasets, IE 50,000 member groups with addition/deletion, large entry caches and replication.

This performance can be improved through a better management of deletion events in the valueset api.


Metadata Update from @firstyear:
- Issue assigned to firstyear

6 years ago

Metadata Update from @firstyear:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to review
- Custom field type adjusted to None
- Custom field version adjusted to None

6 years ago

This is a patch which would deserve cmocka based unit test.
/me hides :-)

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4 backlog

6 years ago

the patch looks good.
is there a need for an extra loop to set the -1 in the sorted array, it could be done in one loop

  csnset_purge(&(vs->va[vs->sorted[i]]->v_csnset),csn);
  if (vs->va[vs->sorted[i]->v_csnset == NULL) {
        ...
        vs->va[vs->sorted[i]]  = NULL
        vs->sorted[i] = -1

@lkrispen Actually I made it in two loops for the sake of this early check:

   if (s != numValues) {
       slapi_ch_free ((void **)&vs->sorted);
       vs->sorted = NULL;
   }

I wanted to make sure we have the same count of surviving values in both vs->va and vs->sorted arrays (or otherwise invalidate and remove vs->sorted) before we proceed with the array compact.
If you think this check isn't required, I can remove it and purge both in a single loop.

Well, now that I think about it again, it seems I have just been extra cautious, I think it should be safe to remove that check and purge them in the same loop, will update the patch shortly.

This looks good to me, I'll leave @lkrispen to set ack however :)

@lslebodn I agree it would be greate to have a cmocka test here, but I think there is a lot of parts we would need to create first to really enable this to work properly. I think we should get to this point though,

The patch is also looking great to me.
Few minor comments. The call to valueset_array_to_sorted is systematic (whatever the number of values there are in the array). It used to be triggered only threshold (VALUESET_ARRAY_SORT_THRESHOLD). I have not real concern about it but as far as I understand it means that all valusets will be sorted even from small number of values.

Also, reallocate of 'va' and 'sorted' arrays is there to shrink the array.
Would it be an issue to keep a larger value array than it is strictly needed. It could eliminate a call to realloc and keep rooms if later new value are added.

@tbordaz A asked to have the realloc added. If the entry is long lived in the entry cache, than this means that we may have some unbounded growth of memory. I would rather us reduce the allocation instead. Perhaps we could do a reduction to say have a few elements free instead? So if we go from say size 100 elements to 50, maybe we keep some extra to prevent the next alloc? But I also think this is a lot of excess work ....

@firstyear, I understand the benefit of shrinking as must as possible such value arrays. However, valueset is smart enough to work even with extra available slots (in 'va' and 'sorted'). We could shrink only if it recovers enough space (i.e. 100 slots => 100*(8b+4b) ~ 1K), else it will save a call to realloc.
Also if we shrink those arrays to the exact required size that also mean we need to alloc new arrays each time we add value(s).

ok, hadn't looked into th erealloc problem before, I think there is still some unnecessary overhead.

We allocate va2 with the required size, then populate it with the vauues in sorted order and then go into copying them into va, using dup and free and then resize va.
Why do we not just set va as valuearray in the valueset and free the old va ? the va will then have the proper size already.

Yes, actually, this is a better approach @lkrispen I'm surprised I didn't see it :)

Sorry, I caught a miss while trying to test the previous patch.

I think you do not need to call slapi_valueset_done() before replacing the arrays, keep the references and just free the array itself

The patch looks good to me (and actually really great one !). you have my ack

Yes, I think valueset done would free the values, which wuold cause a bug. Just free the va.

Yes, I think valueset done would free the values, which wuold cause a bug. Just free the va.

no. with @nweiderm latest patch it should work, but has an overhead. when, in the patch, the value is added to va2 it is now duplicated, then valuset_done frees va and va2 is set to the valueset, so it should work, but I think the alloacation in duplicating th evalue and the free in valuset done can be avoided.
just pass the value ref to va2, then only free the array va and set va2, same for sorted, no valueset_done needed

Sorry, I caught a miss while trying to test the previous patch.

I think you do not need to call slapi_valueset_done() before replacing the arrays, keep the references and just free the array itself

Yep, I did it this way initially and actually that's why there was that miss, but then I thought that there might be unfreed values in va that could leak, that's why I thought of disposing off vs and rebuilding it, am I mistaken in this point and being unnecessarily extra cautious?

I think that the values would already be freed in the purge, the array just has dangling pointers at this point. If you do a build and test with asan you'll see pretty fast if you have a leak or not. I think that Ludwig's advice is correct, you can just swap va/va2 and you don't need to free anything from valueset_done (tbh valueset done may actually introduce a heap-use-after-free in this context ...)

Yep, I did it this way initially and actually that's why there was that miss, but then I thought that there might be unfreed values in va that could leak, that's why I thought of disposing off vs and rebuilding it, am I mistaken in this point and being unnecessarily extra cautious?

yes, I think you are too cautious. if you process va then some of the values are freed an the reference in va is NULLed, then the remaining references are copied to va2. So there should be no more danglin references in va and you can just free the array itself.

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

I'll merge this for you shortly @nweiderm

commit a43a8ef
To ssh://git@pagure.io/389-ds-base.git
63a0a59..a43a8ef master -> master

Thank you so much @nweiderm this is a great contribution. I look forward to working with you on many more,

Thank you very much all for the review. Thanks for the merge @firstyear. Of course, likewise :)

This commit trigger compiler warnings:

../389-ds-base/ldap/servers/slapd/valueset.c: In function ‘valueset_array_purge’:
../389-ds-base/ldap/servers/slapd/valueset.c:788:24: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
             vs->sorted = sorted2;
                        ^
../389-ds-base/ldap/servers/slapd/valueset.c:807:20: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
         vs->sorted = (int *) slapi_ch_malloc( vs->max* sizeof(int));

Oh, that's because I wrote the patch for our running version 1.2.11, and in that version sorted was integer pointer, but in the master it's size_t pointer.

a85e247..71b109f 389-ds-base-1.3.7 -> 389-ds-base-1.3.7

71b109f..01530f4 389-ds-base-1.3.7 -> 389-ds-base-1.3.7 --> compiler warning

4248346..3c8c692 389-ds-base-1.3.6 -> 389-ds-base-1.3.6

3c8c692..e6ac5ec 389-ds-base-1.3.6 -> 389-ds-base-1.3.6 --> compiler warning

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.6.0 (was: 1.4 backlog)

6 years ago

Closed as this is complete.

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2460

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata