#49290 Performance regression (from 1.3.4) in complex searches with time syntax
Closed: wontfix 7 years ago Opened 8 years ago by pj101.

Issue Description

The same search filter applied to the same data (imported ldif) on the same hardware and with the same indexes (the installation and import is scripted) results in 19 s search time on 389ds v 1.3.4.14 and ~95s on 389ds v1.3.6.6. So the search time is multiplied by 5 (!!!). I have found it when we put 1.3.6.6 (+ latest commits, compiled from source) in production - our synchronization scripts started getting LDAP_TIMELIMIT_EXCEEDED ldap error. The other problem i see in production (and havent seen in tests) is the one that already has a recently open ticket - https://pagure.io/389-ds-base/issue/49289

Package Version and Platform

The server is recompiled from sources from git (so 1.3.6.6+ all newer commits up to today). I have tried to compile with and without tcmalloc, tried to disable and enable nunc-stans, it does not change anything. So the problem may come from some other place, probably the place where time syntax comparison happens or the filters with "or" logic are handled. Maybe it has something to do with new high-resolution time and increased cost of time comparison somewhere.

Steps to reproduce

  1. Create ~50000 entries in ldap. Change some of them to have different modifyTimeStamp on them.
  2. Create a specific filter with an indexed attribute. I took nsUniqueId. The filter looks as follows:
    '(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=...)...)
    The nsUniqueId values should be the existing values of entries in the LDAP. The filter should contain about 400-500 different nsUniqueId values. Put the filter to the file, say /tmp/filt.txt
  3. Make a search with a user who can see the nsUniqueId attribute:
    ldapsearch -Y GSSAPI -h ldap-model.polytechnique.fr -b 'dc=example,dc=com' cat /tmp/filt.txt > /dev/null

Check the LDAP logs, everything is fine, time ~0.6s:
[14/Jun/2017:16:15:58.157246892 +0200] conn=281 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd211b2-80b8dab2-9b8b0000)(nsUniqueId=b7..." attrs=ALL
[14/Jun/2017:16:15:58.777008483 +0200] conn=281 op=3 RESULT err=0 tag=101 nentries=425 etime=0.623000
[14/Jun/2017:16:15:58.778350415 +0200] conn=281 op=4 UNBIND

  1. Make a search with a user who can see the modifyTimeStamp attribute (>= may be replaced even by =, it does not change time taken):
    ldapsearch -Y GSSAPI -h ldap-model.polytechnique.fr -b 'dc=example,dc=com' '(modifyTimestamp>=20170614075000.0Z)' > /dev/null

Check the LDAP logs, everything is fine, time ~0.4s:
[14/Jun/2017:16:18:19.707345434 +0200] conn=288 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(modifyTimestamp>=20170614075000.0Z)" attrs=ALL
[14/Jun/2017:16:18:20.089818639 +0200] conn=288 op=3 RESULT err=0 tag=101 nentries=3 etime=0.383000

  1. Now do a combined search using "or" logic of these two filters, you can use >= or = for time comparison - (|(modifyTimestamp>=20170614075000.0Z)(nsUniqueId=...)(nsUniqueId=...)...)

ldapsearch -Y GSSAPI -h ldap-model.polytechnique.fr -b 'dc=example,dc=com' '(|(modifyTimestamp>=20170614075000.0Z)'cat /tmp/filter-nsuniqueid.txt')' > /dev/null

Check the log, the new search takes 94 seconds.
14/Jun/2017:16:22:30.811592927 +0200] conn=291 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL
[14/Jun/2017:16:24:05.166657721 +0200] conn=291 op=3 RESULT err=0 tag=101 nentries=427 etime=94.357000
[14/Jun/2017:16:24:05.167908246 +0200] conn=291 op=4 UNBIND
[14/Jun/2017:16:24:05.168421861 +0200] conn=291 op=4 fd=266 closed - U1

  1. Repeat the previous search on 389ds v1.3.4.14+latest branch commits, it takes about 19 seconds, so in 1.3.6.6 we have 5-time increase of search time.

[14/Jun/2017:16:27:04 +0200] conn=755 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL
[14/Jun/2017:16:27:25 +0200] conn=755 op=3 RESULT err=0 tag=101 nentries=425 etime=19.556000
[14/Jun/2017:16:27:25 +0200] conn=755 op=4 UNBIND
[14/Jun/2017:16:27:25 +0200] conn=755 op=4 fd=256 closed - U1

Actual results

In itself it is already strange that filter x takes 0.6s, filter y takes 0.4s and filter (|(x)(y)) takes 19 seconds, but between 1.3.4 and 1.3.6 this filter search time increases from 19 to 95 seconds.

The workaround i have used - i have rewritten the synchro scripts to do two ldap searches instead of one and then iterate over the union of results. However this 5-fold increase of search time between versions may hide something important in server code.

Expected results

There should be no 5-time performance regression from 1.3.4 to 1.3.6.
Even better, if it is possible, (|(x)(y)) filter should take a little bit more time than time of (x) + time of (y). In these tests it is 20 (1.3.4) or 100 (1.3.6) times more.


Metadata Update from @mreynolds:
- Custom field type adjusted to defect
- Issue set to the milestone: 1.3.6.0

8 years ago
Actual results

In itself it is already strange that filter x takes 0.6s, filter y takes 0.4s and filter (|(x)(y)) takes 19 seconds, but between 1.3.4 and 1.3.6 this filter search time increases from 19 to 95 seconds.

I think there are two issues here. One is that we have an issue with handling the result set from (x) and (y) which causes the additional 18s delay.

However, the issue for increasing search time seems more severe. I want to confirm to be sure you have copied the same tuning (thread and memory) settings between the systems so that we can eliminate any discrepancies.

Yes, it's a VM, th evalues for threads:
nsslapd-maxthreadsperconn: 5
and i have put nsslapd-threadnumber: 30 for both versions to avoid 1.3.6 thread autotuning and compare results. Memory autotuning is also disabled and parameters allow for full data in memory cache, so no disk swapping happens :

cn=config,cn=ldbm database,cn=plugins,cn=config
nsslapd-dbcachesize: 1073741824
cn=userRoot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 536870912
cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
currententrycachesize: 280307283
maxentrycachesize: 536870912

Hi,

I can't reproduce this. I have 100,000 entries. I perform:

time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(modifyTimestamp>=20161104045909Z)'
0.00s user 0.01s system 1% cpu 0.903 total
time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3)'
0.00s user 0.00s system 61% cpu 0.006 total
time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(|(modifyTimestamp>=20161104045909Z)(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3))'
0.00s user 0.01s system 0% cpu 1.562 total

There is certainly a discrepancy between the or and the single filters, but not the amounts you are reporting. If I change the modify timestamp query to match all 100,000 objects:

time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(|(modifyTimestamp>=0)(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3))' &> /tmp/out
   0.93s user 0.35s system 15% cpu 8.000 total
time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(modifyTimestamp>=0)' &> /tmp/out
1.00s user 0.40s system 18% cpu 7.746 total

Certainly there is a small difference when we add the or to the search, but even selecting 100,000 objects I'm not seeing the issues here.

Do you mind checking your entry cache sizes and dbcachesizes for me? I want to know if they are identical between the two machines (in theory, 1.3.6 should be larger due to the new autotuning feature). I suspect that perhaps what you might be seeing is cache eviction because your set size is so large that it can't all fit in the cache, so you are being penalised for it.

As for cache sizes, i have disabled autotuning [since it will waste a huge amount of memory on a small NetscapeRoot suffix] and use the same values (see my comment https://pagure.io/389-ds-base/issue/49290#comment-445260) in both 1.3.4 and 1.3.6. There are 0 evicted entries, the cache is large enough to include all the entries and indexes in memory:
cn=monitor,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: extensibleObject
cn: monitor
database: ldbm database
dbcachehits: 1522350
dbcachetries: 1537659
dbcachehitratio: 99
dbcachepagein: 15309
dbcachepageout: 68
dbcacheroevict: 0
dbcacherwevict: 0

cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config
nsslapd-db-page-ro-evict-rate: 0
nsslapd-db-page-rw-evict-rate: 0

cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
139 cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: extensibleObject
cn: monitor
database: ldbm database
readonly: 0
entrycachehits: 364767387
entrycachetries: 366740011
entrycachehitratio: 99
currententrycachesize: 288374827
maxentrycachesize: 536870912
currententrycachecount: 40115
maxentrycachecount: -1
dncachehits: 280650
dncachetries: 320727
dncachehitratio: 87
currentdncachesize: 4539142
maxdncachesize: 16777216
currentdncachecount: 40115
maxdncachecount: -1
normalizeddncachetries: 63347204
normalizeddncachehits: 63117400
normalizeddncachemisses: 229804
normalizeddncachehitratio: 99
currentnormalizeddncachesize: 20000597
maxnormalizeddncachesize: 20971520
currentnormalizeddncachecount: 65910

As for the performance check, you are not using the same filter as me - you use only one sub-component (nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3=)... In this case there is no perceptible difference since only two components are present in the filter. Of course on our servers we have the same results. As described in my initial comment, for the problem to be seen clearly you need a very long filter including about 400-500 "or" components:

(|(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3)(nsUniqueId=<value of some nsUniqueId of a second entry>)(nsUniqueId=...) (nsUniqueId=...))

SInce nsUniqueId is indexed, this huge filter with 400-500 components is very fast and takes less than a second to complete (i check etime in ldap logs). If you add in the beginning of that filter a single "or" component like (modifyTimestamp>=20161104045909Z) to it, the time jumps to 20-s on 1.3.4.4 and ~95s in 1.3.6.6.

Okay, that makes more sense. Let me check this again with this information now. Thanks for following up.

@pj101 Okay, I think I understand the issue now. The issue is not the modify time, but the number of results that may be return with it, and the number of filters you have.

In filterindex.c (ldbm) we apply the list of filters in order and union their results. In your separate queries this is quite fast. One one hand we have a single search term so we are able to return immediately. On the second, each IDList from the nsuniqueid is small, so the difference is hard to see.

However, when you add the two together, what happens is the code calls into list_candidates and starts to loop over the filters you have requested. Each iteration, the result sets are unioned, but the quirk is that the result sets are ordered - as the result set grows, as does the time to create the next intermediate. This is what is causing the drastic slow down in your results.

Interestingly, an approach may be to test the modifyTimestamp at the end of your query and do that last to "avoid" merging the large set til the end.

The ultimate issue here though, is the inefficient way we are performing the set manipulations to create the final result sets. I'm investigating a fix for this now.

Ok, that design might be the reason of large search time compared to each subfilter. But why is it 5 times slower in 1.3.6 than in 1.3.4 considering the filter processing design is the same?

Can you send me the extract from access log at the time of the search that is causing you the problems? I want to check something if that's okay.

I suspect that it's a regression in full table scan performance - the idlist issue on a fully indexed search may be separate (but I'm going to fix it anyway).

Sure. The small extract from access logs fro each case is in my very first post, i will do it again to attach larger extracts.

Ohhh sorry, I missed that. I'll be sure to check them.

I want to confirm with you, does modifyTimeStamp have indexing applied, and what is your nsslapd-idlistscanlimit in cn=config?

We have about 40k entries, nsslapd-lookthroughlimit and nsslapd-idlistscanlimit are adjusted for that value:

cn=config,cn=ldbm database,cn=plugins,cn=config
nsslapd-lookthroughlimit: 50000
nsslapd-idlistscanlimit: 50000

The attribute modifyTImeStamp is not indexed explicitly in our configuration, it's the default server setting. But the notes=U does not appear in search result so maybe it is indexed internally by server:
[22/Jun/2017:10:42:20.040297004 +0200] conn=2120 op=2 RESULT err=0 tag=97 nentries=0 etime=0.002000 dn="uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu"
[22/Jun/2017:10:42:20.040952426 +0200] conn=2120 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(modifyTimestamp>=20170614075000.0Z)" attrs=ALL
[22/Jun/2017:10:42:20.991524180 +0200] conn=2120 op=3 RESULT err=0 tag=101 nentries=2136 etime=0.950000

While I have been testing this, I've noticed that the when you ask for modifyTimestamp as a range request forces it to fall back to allids IE full table scan. So I think that indexing modifyTimestamp is step one to resolving this. I think the current summary of the issue is:

  • you need to index modifyTimestamp
  • there is a performance issue in indexed searches related to intersections and unions (I'm working on this)
  • full table scans (allids) has had a regression of some nature
  • when an un-indexed range request is made, notes=U is not set.

Not sure there is an index type for range request. According to https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html-single/configuration_command_and_file_reference/#nsIndexType
there are 6 types of indexes (nsIndexType):
pres = presence index
eq = equality index
approx = approximate index
sub = substring index
matching rule = international index
index browse = browsing index

Is it "browsing index" that will helped ranged searches (attr>=smth) ?

I think it's an eq index that is required if I recall correctly, but I may need to ask @lkrispen or @mreynolds to be sure. Can you try with eq for now and see if that helps? Remember to run db2index after you apply the eq index.

Yes, i can confirm that with eq indexing the time of the search goes down from 90s to 0.99s. So the problem is indeed somewhere in full tables scan changes from 1.3.4 to 1.3.6.

Awesome great to confirm that. Are you willing to test patches that we provide here to see if we can improve the performance here some more?

Yes, with pleasure :) Just send me the patches against 1.3.6 branch if possible, i use that branch in production and in tests.

0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch

Second patch that rewrites list_candidates with an idl_set type to improve intersection performance, and to get ready to improve union performance. Additionally, this should improve not filters perf. Finally, this has been tested with the ticket and suites tests and has a 100% pass, along with adding a new filter test. Finally, it has been tested under asan and has no leaks or crashes.

The patch to set 'notes=U' looks good. I just wonder if it should not also call pagedresults_set_unindexed in case a page result search using unindexed range will get 'notes=U,P'

@tbordaz I took guidance from other areas of the code - other locations don't set pagedresults_set_unindexed in this case either. It looks like in ldbm_search once we are done, if notes=U/A, we set, P if pagedsearch is true,

https://pagure.io/389-ds-base/blob/master/f/ldap/servers/slapd/back-ldbm/ldbm_search.c#_904

I am ok with the first patch and you have my ACK.

Regarding the second patch it is just great reading !! congratulations.
I admit I had hard time to understand the various algos (like that idl_set_union and idl_set_intersect were consuming the list of idl or the logic of jumping over the min idl in intersect). They are great.

Something I was unsure is the removal of FILTER_TEST_THRESHOLD. IMHO it could be a good accelerator skipping additional intersections. Why not keeping it in idl_set_intersec when length(idl_set->min) <= threshold ?

Second though about the second patch.
I am under the impression that a filter '(&(!(comp_1)))' will create a idl_set.count=2 and idl_set.min=NULL.
If it calls idl_set_intersect, it will return NULL where it should return ALLID.
Am I missing something ?

Tested by patching the current 1.3.6 branch. The "notes=U" is now correclty added to the ranged non-indexed search.
The filter used in 1.3.6 with patch and without patch takes the same amount of time (~90s for unindexes search):
[28/Jun/2017:14:26:02.735701306 +0200] conn=1 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL
[28/Jun/2017:14:27:39.377178237 +0200] conn=1 op=3 RESULT err=0 tag=101 nentries=543 etime=96.645000 notes=U

After some operations the server crashes with filter "(&(entryusn>=178)(&(!(ou=))(departmentNumber=)))". I can build a debug version and get threads from this crash if you need

Second though about the second patch.
I am under the impression that a filter '(&(!(comp_1)))' will create a idl_set.count=2 and idl_set.min=NULL.
If it calls idl_set_intersect, it will return NULL where it should return ALLID.
Am I missing something ?

I think it would have idl_set.count == 1 and return head, which is also NULL and not correct

Something I was unsure is the removal of FILTER_TEST_THRESHOLD. IMHO it could be a good accelerator skipping additional intersections. Why not keeping it in idl_set_intersec when length(idl_set->min) <= threshold ?

I noticed that as well, and we could follow your suggestion of handling it in intersec() or take advantage of knowing that all filter components matched and skip filter evaluation if possible

@pj101 Is that with both patches? Or just the first one?

@tbordaz, @lkrispen I did remove the filter test threshold, and didn't see a great change in performance. Without it we already post faster intersections than previously. I can add it back to see if it helps or not. It can go in the same condition as checking for if idl->b_nids == 0, because it's almost the same logic then.

With the filter r '(&(!(comp_1)))'

DS actually internally expands this to '(|objectClass=referral)(&(!(comp_1))))' Which makes the answer more fun ;)

So we would get the outer |() creates an idl_set, and this has into it placed an empty idl, and the result of (&(!(comp_1)))

(&(!(comp_1))) one of two things happens here. Either the filter optimiser reduces it to:

!(comp_1) because ! is a special case of &.

Or we just get the &().

In the first case, we would indeed create an idl_set, and insert the result of !().

!() is treated as an and case. We create the next idl_set here. Because comp_1 is the first filter, we insert ALLIDS to the set, then we add the result of comp_1 as eq. Because we have ALLIDS and the comp_1, we have count == 2, so we then skip the intersection, and execute the complement.

We then recurse back out handing the idl back at each step.

I hope that helps :) if you want I can add this as a test case to the patch.

First patch only for notes=U on unindexed range searches.

commit 0548490
To ssh://git@pagure.io/389-ds-base.git
19b0f43..0548490 master -> master

@firstyear The stacktrace of segmentation fault (in idl_set.c:92) and the performance test i've made are with both patches of course.

@firstyear The stacktrace of segmentation fault (in idl_set.c:92) and the performance test i've made are with both patches of course.

hi Andrej, I tried to reproduce the crash using the same filter pattern, but failed, it will be data dependent, Could you please provide additional Info:
- what indexes are defined for entryusn, ou, departmentnumber
- what is the result of the individual searches
--(entryusn>=178)
-- (&(entryusn>=178))
-- (ou=)
-- (!(ou=))
-- (&(!(ou=)))
-- (departmentNumber=)

and maybe you can enable trace logging for the search that crashes the server

Interestingly enough, the segfault does NOT happen when you make a single research with this filter. Looks like it's a whole search sequence that produces this result. The search with this filter after server restart works without problem. It crashes in our script using multiple searches.
I've managed to crash with other filters. Usually they all have a component entryusn>=smth.
The most simple filter that i've found to produce segfault is as simple as (entryusn>=189). So i think it has nothing to do with other filter components but it has something to do with several sequential searches or timing. The stacktrace for (entryusn>=189) segfault is attached to the message.

The current index on entryusn is the default one:
cn=entryusn,cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config
nsSystemIndex: true
nsIndexType: eq
nsMatchingRule: integerOrderingMatch

entryusn index file
dbscan -r -f /Local/dirsrv/var/lib/dirsrv/slapd-model/db/userRoot/entryusn.db | tail -n 10
=172
8726
=173
8960
=174
9020
=175
6597 24283
=190
5794

There is nothing in ldap error log. I will enable trace logging as you advice and see whether i can reproduce the segfault.

stacktrace-simple-filter-entryusn-larger-than-smth.txt

I have put 16384+1+4+32+65536 corresponding to
Trace function calls. Logs a message when the server enters and exits a function
Heavy trace output debugging
Search filter processing.
Server plug-in debugging. It writes an entry to the log file when a server plug-in calls slapi-log-error

The segfault on (entryusn>=192) is reproducible (after a certain search sequence only). Here is the full access log after server start and up to the crash (all with heavy debug tracingso search times are huge):

[29/Jun/2017:12:29:13.571100296 +0200] conn=1 fd=256 slot=256 connection from local to /Local/dirsrv/var/run/slapd-model.socket
[29/Jun/2017:12:29:13.604151613 +0200] conn=1 AUTOBIND dn="cn=X LDAP Root"
[29/Jun/2017:12:29:13.604912483 +0200] conn=1 op=0 BIND dn="cn=X LDAP Root" method=sasl version=3 mech=EXTERNAL
[29/Jun/2017:12:29:13.610210401 +0200] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.038000 dn="cn=X LDAP Root"
[29/Jun/2017:12:29:13.618050640 +0200] conn=1 op=1 MOD dn="cn=config"
[29/Jun/2017:12:29:13.890467523 +0200] conn=1 op=1 RESULT err=0 tag=103 nentries=0 etime=0.280000
[29/Jun/2017:12:29:13.892826256 +0200] conn=1 op=-1 fd=256 closed - B1
[29/Jun/2017:12:29:32.679421304 +0200] conn=2 fd=256 slot=256 connection from local to /Local/dirsrv/var/run/slapd-model.socket
[29/Jun/2017:12:29:32.723026450 +0200] conn=2 AUTOBIND dn="cn=X LDAP Root"
[29/Jun/2017:12:29:32.723787286 +0200] conn=2 op=0 BIND dn="cn=X LDAP Root" method=sasl version=3 mech=EXTERNAL
[29/Jun/2017:12:29:32.729059067 +0200] conn=2 op=0 RESULT err=0 tag=97 nentries=0 etime=0.049000 dn="cn=X LDAP Root"
[29/Jun/2017:12:29:32.744586910 +0200] conn=2 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="lastusn;userroot"
[29/Jun/2017:12:29:33.048490518 +0200] conn=2 op=1 RESULT err=0 tag=101 nentries=1 etime=0.316000
[29/Jun/2017:12:29:33.066071956 +0200] conn=2 op=2 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(&(objectClass=posixAccount)(objectClass=inetOrgPerson)(objectClass=X-Misc)(x-expirationdate;enseignement>=20170629102933Z))" attrs="1.1"
[29/Jun/2017:12:31:51.372440352 +0200] conn=2 op=2 RESULT err=0 tag=101 nentries=3201 etime=138.323000 notes=U
[29/Jun/2017:12:31:51.391328442 +0200] conn=2 op=3 SRCH base="cn=Acces Salles Informatiques,ou=Salles Informatiques,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu" scope=0 filter="(objectClass=groupofUniqueNames)" attrs="uniqueMember"
[29/Jun/2017:12:31:51.436711753 +0200] conn=2 op=3 RESULT err=0 tag=101 nentries=1 etime=0.063000
[29/Jun/2017:12:31:51.994377209 +0200] conn=2 op=4 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(&(objectClass=posixAccount)(objectClass=inetOrgPerson)(objectClass=X-Misc)(x-expirationdate;kelen>=20170629103151Z))" attrs="1.1"
[29/Jun/2017:12:33:32.987795576 +0200] conn=2 op=4 RESULT err=0 tag=101 nentries=226 etime=101.526000 notes=U
[29/Jun/2017:12:33:33.005545198 +0200] conn=2 op=5 SRCH base="cn=Acces Kelen,ou=Serveurs,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu" scope=0 filter="(objectClass=groupofUniqueNames)" attrs="uniqueMember"
[29/Jun/2017:12:33:33.044165722 +0200] conn=2 op=5 RESULT err=0 tag=101 nentries=1 etime=0.050000
[29/Jun/2017:12:33:33.093124078 +0200] conn=2 op=6 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(entryusn>=192)" attrs="1.1"

the end of the error log is in the attached trace.log file. If you need the whole trace file (~30M) from server startup, i can send it in private since it contains some sensitive data.

The place where it happens is always the same - ldap/servers/slapd/back-ldbm/idl_set.c:92
92 idl->next = idl_set->head;
while idl=0x0

trace.log
stacktrace-heavy-debug-tracing.txt

Here is the same debug trace for the same filter used just after the restart (trace-successful-search.txt). The main difference is here:

[29/Jun/2017:13:07:31.493475164 +0200] - DEBUG - idl_new_range_fetch - Bulk fetch buffer nids=1
[29/Jun/2017:13:07:31.494253948 +0200] - DEBUG - idl_new_range_fetch - =193 returns nids=1
[29/Jun/2017:13:07:31.495024241 +0200] - DEBUG - index_range_read_ext - idl->b_nids=1
[29/Jun/2017:13:07:31.495752023 +0200] - DEBUG - index_range_read_ext - idl->b_nmax=512
[29/Jun/2017:13:07:31.496503766 +0200] - DEBUG - index_range_read_ext - idl->b_ids[0]=5794
[29/Jun/2017:13:07:31.497257876 +0200] - DEBUG - index_range_read_ext - (entryusn,=) 1 candidates
[29/Jun/2017:13:07:31.498011449 +0200] - DEBUG - range_candidates - <= 1
[29/Jun/2017:13:07:31.498771199 +0200] - DEBUG - ava_candidates - <= 1
[29/Jun/2017:13:07:31.499515519 +0200] - DEBUG - filter_candidates_ext - <= 1
[29/Jun/2017:13:07:31.500264499 +0200] - DEBUG - filter_candidates_ext - =>
[29/Jun/2017:13:07:31.501023053 +0200] - DEBUG - filter_candidates_ext - EQUALITY

while in segfault it was:
[29/Jun/2017:12:33:33.124668693 +0200] - DEBUG - index_range_read_ext - (entryusn,=) 0 candidates
[29/Jun/2017:12:33:33.125448078 +0200] - DEBUG - range_candidates - <= 0
[29/Jun/2017:12:33:33.126202858 +0200] - DEBUG - ava_candidates - <= 0
[29/Jun/2017:12:33:33.126958174 +0200] - DEBUG - filter_candidates_ext - <= 0

So it looks like if in one connection several search operations are done it perturbs somehow idl (sets it to 0x0)

trace-successful-search.txt

looks like we have two issues here:
- the crash
- the crash dependent on previous operations

I can reproduce the simple crash. If I have an eq indexed attribute and do a range search (attr>=largestvalue+1) the range search returns NULL and
idl_set_insert_idl (idl_set=0x902b7e0, idl=0x0)
crashes.

But I do not see how to trigger it with range search which would return ids.

I think it may be the same issue - lastusn may be changed during the script execution (e.g. by writing lastLoginTime in user entries) and the search filter is changed dynamically depending on lastusn;userroot. If the attr>=largestvalue+1 segfault is fixed i can test the new patch.

@tbordaz, @lkrispen I did remove the filter test threshold, and didn't see a great change in performance. Without it we already post faster intersections than previously. I can add it back to see if it helps or not. It can go in the same condition as checking for if idl->b_nids == 0, because it's almost the same logic then.

@firstyear It is not only a question of performance, unfortunately it can also change behaviour. E.g in the lib389 test for ticket 48252 we do a search: (&(objectclass=nstombstone)(cn=test_user1))

cn is indexed, but for tombstone the tombstone entry is removed from this index. In the old version, since we have only one tombstone the filter lookup for cn was skipped and the tombstone found. With your patch the cn comp is empty and the intersection is then empty as well.
Now I would say that this is not well designed test, but are we sure there are no other apps running into this ?

@pj101 I'll investigate the cause of the crash as soon as possible. Something that may help is if you do the build with --enable-asan, and repeat the test. It gives better crash outputs in some cases.

@lkrispen Well, the problem here is that we are relying on that behaviour. Lets say I did (&(|(cn=a)(cn=b).....(cn=z))(objectclass=nstombstone)). Now I won't get any entries because cn is indexded and the result set is larger than the FILTER_TEST threshold, so we'll get the wrong result anyway.

Certainly, we can add it back for the "small queries" to work and pass, but I think it's an incorrect solution to the issue. I think that if we have nstombstone in a query, we probably need to do an allids search if the behaviour today is to strip them from their related indexes. That would resolve this case and the large case I proposed, and it would have correct behaviour given the lack of indexing on tombstoned attrs. Perhaps that is a follow up patch to correct that behaviour?

After some more research and tests I can confirm that crash happens only with the search (entryUSN>=max_usn_value), the search sequence or history is irrelevant. Our script does that search because lastusn;userroot does not correspond to the real maximum value in entryUSN index:

ldapsearch -x -h ldap-model.polytechnique.fr -b "" -s base '(objectClass=*)' 'lastusn;userroot'
dn:
lastusn;userroot: 176

dbscan -r -f /Local/dirsrv/var/lib/dirsrv/slapd-model/db/userRoot/entryusn.db | tail -n 10
=173
8960
=174
9020
=175
6597 24283

What is interesting, after server restart the value of lastusn;userroot becomes 175, so after server restart the same script and search does not crash the server. That is the explication of the behavior described in previous comments.

So, at the end of the day, actually there are only two completely independent problems:
1. an easily reproducible range search (attr>=largestvalue+1) crash related to this patch
2. lastusn value at suffix level does not correspond to real last usn. i think it is related to the ticket i've made about duplicate USNs in index: https://pagure.io/389-ds-base/issue/49300

And by the way when i enable ASAN no core dumps are produced. I guess there are some environment variables that need to be tweaked somewhere...

@pj101 thanks for all this work and clarifying this. I was scared about an intedependence of operations which I could not explain. But we can easily reproduce the crash and will fix it.

And you are right, there are more issues with entryusn, but we should handle this in the other ticket.

@firstyear
William, the crash is easily to reproduce and explain. Just do a search for "sn>=ZZZZ".
it will call filter_candidates_ext() in list_candidates() and receives an empty list tmp == NULL since there are no entrioes matching the filter.
Then idl_set_insert_idl is called wit a NULL idl and immediately tries to dereference it.

to fix it we either have to check idl in idl_set_insert_idl() or to avoid calling it if tmp is NULL

@lkrispen I've been able to reproduce this now, thanks! I've added it to my filter logic test and will make a fix.

The answer is that we need the range candidates function on an empty set to return an idl with b_nids of 0. Previously we used a NULL to represent empty set, but because of how the idl_set code works, we use a linked list from idl->next. If we get idl == NULL, we can't add it to the list as a result. I thought about making it handle this case, and setting an empty set present flag, but it seemed like a premature optimisation. As a result, I decided to fix the places that return NULL. Now NULL should be an error case, and an idl is always returned, even if it's empty.

Arguably, this is just an error of C the language, that often we confuse types like this. NULL means the absence of something, whereas here we have NULL is an empty set or an error, and idl can have b_nids 0 (making it eq to NULL), or b_nids > 0.

Instead we should be clear about our representations of data - if it's NULL, it means we are missing the idl because something unknown happened. Otherwise, in all othercaes we have an idl, and it can be between 0 -> ALLIDS candidates. From there it is much easier to process than adding more edge cases :)

I'll update this patch soon, thanks,

@lkrispen As another point, I think I should open a new ticket for the nsTombstone search issue, namely that when oc=nstombstone is present we need to do allids search. Do you agree?

@lkrispen As another point, I think I should open a new ticket for the nsTombstone search issue, namely that when oc=nstombstone is present we need to do allids search. Do you agree?

no. We explicitely added an indexed tombstonecsn to avoid allids search in tombstone purging. allids for tombstone purging is a nogo

No more crashes with the new patch. No performance improvement of one unindexed ranged filter combined by "or" with many indexed equality subfilters (very first comment), search time does not change much from 90s, meaning same 5-time perf loss for this filter compared to 1.3.4,

maybe this has nothingto do with filters, but is an aci problem, could you run the searches as directory manager and compare results of 1.3.4 and 1.3.6 ?

It could be a side effect of the RFE #48275, which changed the way aci and filter evaluation was doen in OR filters

these searches are run as a user that has no limitations and has the same rights as directory manager (only nsUniqueId and modifyTimeSTamps are used in that filter anyway)
aci: (targetattr ="*")(version 3.0;acl "Directory Administrators Group";allow (all) (groupdn = "ldap:///cn=LDAP administration,...);)

But i have run the same search with directory administrator user and you are right - the search time with directory administrator is approximately the same as for 1.3.4 (22s for 1.3.6 and 18.6s for 1.3.4). So i think you have correctly pinpointed the underlying mechanics - ACI evaluation in "or" filters with some unindexed subfilters.

I uploaded a quick and dirty patch to address the issue. The basic idea is to avoid the reavaluation of access to ab attribute if we already have done so.
This patch does this for or filters, but only has a "history" length of 1 and does not handle evaluation in nested filters.
To do it properly it would have to be done at the lowest level of the nesting and the history would have to be passed somehow

@lkrispen As another point, I think I should open a new ticket for the nsTombstone search issue, namely that when oc=nstombstone is present we need to do allids search. Do you agree?

no. We explicitely added an indexed tombstonecsn to avoid allids search in tombstone purging. allids for tombstone purging is a nogo

Well, the behaviour you are talking about with the filter test threshold is a bug here. If we request (&(oc=nstombstone)(cn=bar)) we get a result, but if we do (&(oc=nstombstone)(|(cn=bar).....(cn=zar))) we'll get no result because we exceeded the threshold.

So we need to do something to correct this behaviour, at the moment we rely on this quirk and no one complains because we search for singular entries.

I imagine the search you are mentioning for purging is (&(objectClass=nstombstone)(tombstonecsn= ....)) which is why you don't want oc=nstombstone to cause allids.

Some options could be:

  • index cn regardless of tombstone, and default to adding !(oc=nstombstone) to filters to exclude them (unless requested).
  • make the tombstonecsn filter not check objectClass
  • Have the oc=nstombstone default to allids unless it's an internal op
  • something else clever that I haven't thought of yet.

With regard to the patch you provided for the aci check:

  • why do we limit this just to equality filters?
  • type_has_access and type_has_no_access are single values, and update each iteration of for ( f = flist; f != NULL; f = f->f_next ) {, so this means we only get the benefit if successive filters are the same type. IE if we did:
(|(uid=....)(cn=....)(uid=....))

We only speed up uid= processing. So if someone re-arranged this:

(|(cn=....)(uid=....)(uid=....))

Even though there are more uid types we should cache the access to, cn= is the only one we speed up.

Finally, this may cause an aci to not be evaled if we have:

ou=People
aci: allow ....

uid=a,ou=People,....
aci: deny

uid=b,ou=People,.....

If we hit uid=a first, we would get a deny, then we deny uid=b incorrectly. If we hit uid=b first, we would allow access to a because we don't re-eval the aci on it.

@firstyear Hi, I think you miss a few points here.
- I addressed the limitations of this patch, it addresses only eq in or filter lists, but as I said to do it properly it would have been done at the level where the real access test on the attribute is done, butthis would mean that the "history" of evaluated attrs has to be passed to all the filter eval functions, too much effort to verify if it addresses the currrent problem
- the history length is only 1, but that could be changed
- the optimization is effective for any consecutive sequence of the same attribute. In (|(cn=....)(uid=....)(uid=....)) it would first evaluate access to cn and store cn in has_access, next it detects that uid != cn and evaluates uid and stores uid to has_access, but invreasing the history length could also cover (|(uid=....)(cn=....)(uid=....))
- the evaluation is for filter access only and it is done for each entry again, the result of entry a doe not affect entry b

i have tested the patch from comment https://pagure.io/389-ds-base/issue/49290#comment-447205, it brings back the performance of the search on par with 1.3.4 (20s)

@pj101 thanks for testing.
if I understood correctly indexing modifyTimestamp for equality gives even better performance, so we have some time to do this optimization properly, not only adhoc for equality and or filter lists

@lkrispen It looks like with the nsTombstone we only index objectClass=nsTombstone, nsUniqueId, nscpEntryDN, nsTombstoneCSN. I think that if we add entryusn to this list, then we can leave the current behaviour in place in the idl_set patch. This would mean:

(&(objectclass=nsTombstone)(entryusn<=%s))
(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))
(&(%s<=%s)(objectclass=nsTombstone))", SLAPI_ATTR_TOMBSTONE_CSN

All work with the tombstones. It would also mean other queries still would not pick up the tombstones during a search op. This seems like the simplest solution.

I think the only side-effect might be that in an entryusn search you would now see nsTombstones in the list of updates, but is that a bad change? really, these are "part of the updates" that are occuring. Without the aci to see them anyway, they would be filtered.

0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch

@lkrispen This adds the nstombstones to entryusn index always, and alters a single test case to assert the correct behaviour. This passes all the test suites including replication and the ticket tests ds/dirsrvtests/tests/tickets/ticket48005_test.py ds/dirsrvtests/tests/tickets/ticket48252_test.py which both check usn as well as the usn tombstone clean task.

@pj101 thanks for testing.
if I understood correctly indexing modifyTimestamp for equality gives even better performance, so we have some time to do this optimization properly, not only adhoc for equality and or filter lists

Should we open a new ticket to improve aci evaluation performance for ALLIDS searches then?

@pj101 thanks for testing.
if I understood correctly indexing modifyTimestamp for equality gives even better performance, so we have some time to do this optimization properly, not only adhoc for equality and or filter lists

@lkrispen Yes, indexing modifyTimestamp resolves the original problem resulting in better performance that full table scans. There is no hurry, i just wanted to dig out what happened between 1.3.4 and 1.3..6 to get that performance hit. Now that we know it, we are at least aware of it and you can take your time.

@firstyear is entryUSN now an obligatory attribute to count on its presence and indexing? I always activated it explicitly using the USN plugin in server configuration scripts.

@lkrispen I may have a follow up patch to this which improves union and intersect behaviour again by reducing the number of allocations. Should we make that a second patch? Or are you happy for me to revise this (yet again)? :)

@lkrispen I may have a follow up patch to this which improves union and intersect behaviour again by reducing the number of allocations. Should we make that a second patch? Or are you happy for me to revise this (yet again)? :)

one patch would be easier to review

Tweak to the patch to loosely sort incoming IDLs - this should now improve the search case that @pj101 has with the | filter.

I have tested this patch - it has the same timing for initial filter with non-indexed modifyTimeStamp as non-patched 1.3.6 (90s)

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.7.0 (was: 1.3.6.0)

7 years ago

@pj101 My patch affects the indexed modifyTimestamp case - not the unindexed case.

0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch

Final version of the idl_set patch. This adds k-way union and intersection, as well as some basic case shortcuts. In single thread searches I have seen up to 10% reduction in etime, but in multi-thread load tests (ldclt) I see roughly 2% - 4% depending on query type.

An important aspect of this patch is that it makes complex filters more efficient in handling. This means filters with at least 3 & conditions, or 3 | conditions. For filters that use 2 or less, this patch does not affect performance.

For example,

(&(objectClass=person)(|(uid=test0000001) ....))
(&(objectClass=organizationalPerson)(objectClass=posixAccount)(objectClass=person)(|(uid=test0000001) ... ))

We tend to see filters like the latter emitted by software like SSSD quite a lot so improvements to this handling is going to help us across the board.

Another improvement of this is that filter operator order now does not affect performance. Previously these queries had a significant difference in processing time:

(|(modifyTimestamp>=X)(uid=X)(uid=....))
(|(uid=X)(uid=....)(modifyTimestamp>=X))

The second query would outperform the former, but with the k-way algorithm this is not the case.

Finally, in the case you have an allids in an | or empty in &, we shortcut earlier without doing any list processing. This makes these operations faster because we don't do work and throw it away.

In the future because of the design of this there are now other locations to improve.

  • Apply the filter test threshold as a shortcut method in idl_set_intersection. If we have a single IDList that is < threshold, we can return the smallest IDList without having performed any intersections. Again, we reduce the amount of work we throw away.
  • Add a step to query optimisation that moves objectClass equality filters to the tail of the filter, IE:
    (&(objectClass=person)(uid=william)). This works in conjunction with the above. As we get the IDList from uid=william first, we would shortcut and apply the filtertest without needing to access the objectClass index files / IDList.
  • Add a query plan explanation which can show which elements of queries are unindexed or ALLIDS that may be causing odd behaviours that are not expected.

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to review

7 years ago

idl_compare is defined and implemented but not used

Yep, I was using it to assert in tests that my new code gave the same answers. Ill remove that and add filter test threshold back in.

about allids and not: if we have a filter "(&(objectclass=*)(!(objectclass=organizationalunit)))"

and many entries, the first component is allids, and assume there is only one org unit.
we enter idl_set_intersect and detect that the result list is allids. Now we go to handle the not part. we have

(gdb) p result_list
$2 = {b_nmax = 0, b_nids = 10005, next = 0x0, itr = 0, b_ids = {0}}
(gdb) p
idl_set
$1 = {count = 0, allids = 1, min_size = 0, total_size = 0, head = 0x0, complement_head = 0x7f30580017b0}
and we enter:
if (idl_set->complement_head != NULL && result_list->b_nids > 0)
and in idl_notin() run into this case:

if ( ALLIDS( a ) ) { / Not convinced that this code is really worth it /

I am convinced that this code is not worth it (allids - idl) should be allids


about the threshold: I don't like the ifdefs and think we know that FILTER_TEST_THRESHOLD is defined, the code should always work for

idl_set->minimum->b_nids <= FILTER_TEST_THRESHOLD
if FILTER_TEST_TRESHOLD == 0 or > 0, I can even imagine we once want to make it configurable and just use the defined threshold as default.
I would suggest somthing like
- extend idlset to have a filed threshold
- in idl_set_create do #ifdef FILTER_TEST_THRESHOLD { threshold = FILTER_TEST_THRESHOLD}
- compare idl_set->minimum->b_nids <= idl_set->threshold

I'm happy to fix the filter_test_threshold if def, but the allids with not is a seperate issue imo - we can't keep pilling in improvements into this patch else we'll never land it.

Let's get this in it's current form in, then we can fix the allids not case. It's already the case this code works as described, we are only looking because I happen to be working nearby it :) It's great we are looking at this issue, but lets make it seperate from the idl_set code.

I'm happy to fix the filter_test_threshold if def,

thanks

but the allids with not is a seperate issue imo - we can't keep pilling in improvements into this patch else we'll never land it.

ok, I was under the impression that your patch had moved a shortcut out of the current implementation, but I was wrong
Your handling of the NOT part is already better than before by handling them after intersecting the other components. I had thought of a simple check before gouning into the not handling:
if (idl_is_allids(result_list) return result_list;

but we can do this is a separate ticket, so I'm ack with this

commit 30c4f30
To ssh://git@pagure.io/389-ds-base.git
2339b25..30c4f30 master -> master

Thanks mate! Really appreciate your thorough review of this :)

@lkrispen In this patch that still leaves the ACI issue at hand. How do we want to progress on this?

@lkrispen In this patch that still leaves the ACI issue at hand. How do we want to progress on this?

I think we need two follow up tickets: one for the NOT case and one for the ACI stuff

@lkrispen Did we produce the ACI issue ticket from this?

Metadata Update from @firstyear:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field version adjusted to None

7 years ago

created #49456 for the aci issue,

do we still need something for the NOT case ?

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to ack (was: review)

7 years ago

@lkrispen For now lets focus on OR, and then check NOT as well? maybe open a ticket to check the NOT aci handling?

there is no NOT aci problem, the perf problem for aci is in AND/OR with filter components using the same attribute repetedly - and I have created the ticket for this.

I was unsure if we need something for NOT in the idl handling, and I think we should have another ticket for that.
And close this one as fixed.

Yep, sounds good to me. I think there could be improvements in NOT handling, but I'll need to investigate. I'll open that ticket up.

Closing this issue,

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2349

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

4 years ago

Log in to comment on this ticket.