The same search filter applied to the same data (imported ldif) on the same hardware and with the same indexes (the installation and import is scripted) results in 19 s search time on 389ds v 1.3.4.14 and ~95s on 389ds v1.3.6.6. So the search time is multiplied by 5 (!!!). I have found it when we put 1.3.6.6 (+ latest commits, compiled from source) in production - our synchronization scripts started getting LDAP_TIMELIMIT_EXCEEDED ldap error. The other problem i see in production (and havent seen in tests) is the one that already has a recently open ticket - https://pagure.io/389-ds-base/issue/49289
The server is recompiled from sources from git (so 1.3.6.6+ all newer commits up to today). I have tried to compile with and without tcmalloc, tried to disable and enable nunc-stans, it does not change anything. So the problem may come from some other place, probably the place where time syntax comparison happens or the filters with "or" logic are handled. Maybe it has something to do with new high-resolution time and increased cost of time comparison somewhere.
cat /tmp/filt.txt
Check the LDAP logs, everything is fine, time ~0.6s: [14/Jun/2017:16:15:58.157246892 +0200] conn=281 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd211b2-80b8dab2-9b8b0000)(nsUniqueId=b7..." attrs=ALL [14/Jun/2017:16:15:58.777008483 +0200] conn=281 op=3 RESULT err=0 tag=101 nentries=425 etime=0.623000 [14/Jun/2017:16:15:58.778350415 +0200] conn=281 op=4 UNBIND
Check the LDAP logs, everything is fine, time ~0.4s: [14/Jun/2017:16:18:19.707345434 +0200] conn=288 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(modifyTimestamp>=20170614075000.0Z)" attrs=ALL [14/Jun/2017:16:18:20.089818639 +0200] conn=288 op=3 RESULT err=0 tag=101 nentries=3 etime=0.383000
ldapsearch -Y GSSAPI -h ldap-model.polytechnique.fr -b 'dc=example,dc=com' '(|(modifyTimestamp>=20170614075000.0Z)'cat /tmp/filter-nsuniqueid.txt')' > /dev/null
cat /tmp/filter-nsuniqueid.txt
Check the log, the new search takes 94 seconds. 14/Jun/2017:16:22:30.811592927 +0200] conn=291 op=3 SRCH base="dc=example,dc=com" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL [14/Jun/2017:16:24:05.166657721 +0200] conn=291 op=3 RESULT err=0 tag=101 nentries=427 etime=94.357000 [14/Jun/2017:16:24:05.167908246 +0200] conn=291 op=4 UNBIND [14/Jun/2017:16:24:05.168421861 +0200] conn=291 op=4 fd=266 closed - U1
[14/Jun/2017:16:27:04 +0200] conn=755 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL [14/Jun/2017:16:27:25 +0200] conn=755 op=3 RESULT err=0 tag=101 nentries=425 etime=19.556000 [14/Jun/2017:16:27:25 +0200] conn=755 op=4 UNBIND [14/Jun/2017:16:27:25 +0200] conn=755 op=4 fd=256 closed - U1
In itself it is already strange that filter x takes 0.6s, filter y takes 0.4s and filter (|(x)(y)) takes 19 seconds, but between 1.3.4 and 1.3.6 this filter search time increases from 19 to 95 seconds.
The workaround i have used - i have rewritten the synchro scripts to do two ldap searches instead of one and then iterate over the union of results. However this 5-fold increase of search time between versions may hide something important in server code.
There should be no 5-time performance regression from 1.3.4 to 1.3.6. Even better, if it is possible, (|(x)(y)) filter should take a little bit more time than time of (x) + time of (y). In these tests it is 20 (1.3.4) or 100 (1.3.6) times more.
Metadata Update from @mreynolds: - Custom field type adjusted to defect - Issue set to the milestone: 1.3.6.0
Actual results In itself it is already strange that filter x takes 0.6s, filter y takes 0.4s and filter (|(x)(y)) takes 19 seconds, but between 1.3.4 and 1.3.6 this filter search time increases from 19 to 95 seconds.
I think there are two issues here. One is that we have an issue with handling the result set from (x) and (y) which causes the additional 18s delay.
However, the issue for increasing search time seems more severe. I want to confirm to be sure you have copied the same tuning (thread and memory) settings between the systems so that we can eliminate any discrepancies.
Yes, it's a VM, th evalues for threads: nsslapd-maxthreadsperconn: 5 and i have put nsslapd-threadnumber: 30 for both versions to avoid 1.3.6 thread autotuning and compare results. Memory autotuning is also disabled and parameters allow for full data in memory cache, so no disk swapping happens :
cn=config,cn=ldbm database,cn=plugins,cn=config nsslapd-dbcachesize: 1073741824 cn=userRoot,cn=ldbm database,cn=plugins,cn=config nsslapd-cachememsize: 536870912 cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config currententrycachesize: 280307283 maxentrycachesize: 536870912
Hi,
I can't reproduce this. I have 100,000 entries. I perform:
time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(modifyTimestamp>=20161104045909Z)' 0.00s user 0.01s system 1% cpu 0.903 total time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3)' 0.00s user 0.00s system 61% cpu 0.006 total time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(|(modifyTimestamp>=20161104045909Z)(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3))' 0.00s user 0.01s system 0% cpu 1.562 total
There is certainly a discrepancy between the or and the single filters, but not the amounts you are reporting. If I change the modify timestamp query to match all 100,000 objects:
time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(|(modifyTimestamp>=0)(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3))' &> /tmp/out 0.93s user 0.35s system 15% cpu 8.000 total time ldapsearch -H ldap://localhost -x -b 'dc=example,dc=com' -D 'cn=Directory manager' -w password '(modifyTimestamp>=0)' &> /tmp/out 1.00s user 0.40s system 18% cpu 7.746 total
Certainly there is a small difference when we add the or to the search, but even selecting 100,000 objects I'm not seeing the issues here.
Do you mind checking your entry cache sizes and dbcachesizes for me? I want to know if they are identical between the two machines (in theory, 1.3.6 should be larger due to the new autotuning feature). I suspect that perhaps what you might be seeing is cache eviction because your set size is so large that it can't all fit in the cache, so you are being penalised for it.
As for cache sizes, i have disabled autotuning [since it will waste a huge amount of memory on a small NetscapeRoot suffix] and use the same values (see my comment https://pagure.io/389-ds-base/issue/49290#comment-445260) in both 1.3.4 and 1.3.6. There are 0 evicted entries, the cache is large enough to include all the entries and indexes in memory: cn=monitor,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleObject cn: monitor database: ldbm database dbcachehits: 1522350 dbcachetries: 1537659 dbcachehitratio: 99 dbcachepagein: 15309 dbcachepageout: 68 dbcacheroevict: 0 dbcacherwevict: 0
cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config nsslapd-db-page-ro-evict-rate: 0 nsslapd-db-page-rw-evict-rate: 0
cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config 139 cn=monitor,cn=userRoot,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleObject cn: monitor database: ldbm database readonly: 0 entrycachehits: 364767387 entrycachetries: 366740011 entrycachehitratio: 99 currententrycachesize: 288374827 maxentrycachesize: 536870912 currententrycachecount: 40115 maxentrycachecount: -1 dncachehits: 280650 dncachetries: 320727 dncachehitratio: 87 currentdncachesize: 4539142 maxdncachesize: 16777216 currentdncachecount: 40115 maxdncachecount: -1 normalizeddncachetries: 63347204 normalizeddncachehits: 63117400 normalizeddncachemisses: 229804 normalizeddncachehitratio: 99 currentnormalizeddncachesize: 20000597 maxnormalizeddncachesize: 20971520 currentnormalizeddncachecount: 65910
As for the performance check, you are not using the same filter as me - you use only one sub-component (nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3=)... In this case there is no perceptible difference since only two components are present in the filter. Of course on our servers we have the same results. As described in my initial comment, for the problem to be seen clearly you need a very long filter including about 400-500 "or" components:
(|(nsUniqueId=0ca594fc-a24511e6-8523b611-35c5abf3)(nsUniqueId=<value of some nsUniqueId of a second entry>)(nsUniqueId=...) (nsUniqueId=...))
SInce nsUniqueId is indexed, this huge filter with 400-500 components is very fast and takes less than a second to complete (i check etime in ldap logs). If you add in the beginning of that filter a single "or" component like (modifyTimestamp>=20161104045909Z) to it, the time jumps to 20-s on 1.3.4.4 and ~95s in 1.3.6.6.
Okay, that makes more sense. Let me check this again with this information now. Thanks for following up.
@pj101 Okay, I think I understand the issue now. The issue is not the modify time, but the number of results that may be return with it, and the number of filters you have.
In filterindex.c (ldbm) we apply the list of filters in order and union their results. In your separate queries this is quite fast. One one hand we have a single search term so we are able to return immediately. On the second, each IDList from the nsuniqueid is small, so the difference is hard to see.
However, when you add the two together, what happens is the code calls into list_candidates and starts to loop over the filters you have requested. Each iteration, the result sets are unioned, but the quirk is that the result sets are ordered - as the result set grows, as does the time to create the next intermediate. This is what is causing the drastic slow down in your results.
Interestingly, an approach may be to test the modifyTimestamp at the end of your query and do that last to "avoid" merging the large set til the end.
The ultimate issue here though, is the inefficient way we are performing the set manipulations to create the final result sets. I'm investigating a fix for this now.
Ok, that design might be the reason of large search time compared to each subfilter. But why is it 5 times slower in 1.3.6 than in 1.3.4 considering the filter processing design is the same?
Can you send me the extract from access log at the time of the search that is causing you the problems? I want to check something if that's okay.
I suspect that it's a regression in full table scan performance - the idlist issue on a fully indexed search may be separate (but I'm going to fix it anyway).
Sure. The small extract from access logs fro each case is in my very first post, i will do it again to attach larger extracts.
Ohhh sorry, I missed that. I'll be sure to check them.
<img alt="accesslog.txt" src="/389-ds-base/issue/raw/bc91eb3649dc9cc0a8929fa7e3118fc761bd884f32f43464593b3ba8bc890b62-accesslog.txt" />
The access log snippet for the abovementioned connection.
I want to confirm with you, does modifyTimeStamp have indexing applied, and what is your nsslapd-idlistscanlimit in cn=config?
We have about 40k entries, nsslapd-lookthroughlimit and nsslapd-idlistscanlimit are adjusted for that value:
cn=config,cn=ldbm database,cn=plugins,cn=config nsslapd-lookthroughlimit: 50000 nsslapd-idlistscanlimit: 50000
The attribute modifyTImeStamp is not indexed explicitly in our configuration, it's the default server setting. But the notes=U does not appear in search result so maybe it is indexed internally by server: [22/Jun/2017:10:42:20.040297004 +0200] conn=2120 op=2 RESULT err=0 tag=97 nentries=0 etime=0.002000 dn="uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu" [22/Jun/2017:10:42:20.040952426 +0200] conn=2120 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(modifyTimestamp>=20170614075000.0Z)" attrs=ALL [22/Jun/2017:10:42:20.991524180 +0200] conn=2120 op=3 RESULT err=0 tag=101 nentries=2136 etime=0.950000
While I have been testing this, I've noticed that the when you ask for modifyTimestamp as a range request forces it to fall back to allids IE full table scan. So I think that indexing modifyTimestamp is step one to resolving this. I think the current summary of the issue is:
Not sure there is an index type for range request. According to https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html-single/configuration_command_and_file_reference/#nsIndexType there are 6 types of indexes (nsIndexType): pres = presence index eq = equality index approx = approximate index sub = substring index matching rule = international index index browse = browsing index
Is it "browsing index" that will helped ranged searches (attr>=smth) ?
I think it's an eq index that is required if I recall correctly, but I may need to ask @lkrispen or @mreynolds to be sure. Can you try with eq for now and see if that helps? Remember to run db2index after you apply the eq index.
Yes, i can confirm that with eq indexing the time of the search goes down from 90s to 0.99s. So the problem is indeed somewhere in full tables scan changes from 1.3.4 to 1.3.6.
Awesome great to confirm that. Are you willing to test patches that we provide here to see if we can improve the performance here some more?
Yes, with pleasure :) Just send me the patches against 1.3.6 branch if possible, i use that branch in production and in tests.
<img alt="0001-Ticket-49290-unindexed-range-searches-don-t-provide-.patch" src="/389-ds-base/issue/raw/8b29ce7a120a6bbd3c75f588bcdddeca11b1e6e81f324c8fc23bb30f95ae2063-0001-Ticket-49290-unindexed-range-searches-don-t-provide-.patch" />
First patch to mark unindexed range searches with notes=U
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/a3a0e4bfb013388f71f38fb9f492d9d497b84e9a2a49c89cc616fcc620605664-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
Second patch that rewrites list_candidates with an idl_set type to improve intersection performance, and to get ready to improve union performance. Additionally, this should improve not filters perf. Finally, this has been tested with the ticket and suites tests and has a 100% pass, along with adding a new filter test. Finally, it has been tested under asan and has no leaks or crashes.
The patch to set 'notes=U' looks good. I just wonder if it should not also call pagedresults_set_unindexed in case a page result search using unindexed range will get 'notes=U,P'
@tbordaz I took guidance from other areas of the code - other locations don't set pagedresults_set_unindexed in this case either. It looks like in ldbm_search once we are done, if notes=U/A, we set, P if pagedsearch is true,
https://pagure.io/389-ds-base/blob/master/f/ldap/servers/slapd/back-ldbm/ldbm_search.c#_904
I am ok with the first patch and you have my ACK.
Regarding the second patch it is just great reading !! congratulations. I admit I had hard time to understand the various algos (like that idl_set_union and idl_set_intersect were consuming the list of idl or the logic of jumping over the min idl in intersect). They are great.
Something I was unsure is the removal of FILTER_TEST_THRESHOLD. IMHO it could be a good accelerator skipping additional intersections. Why not keeping it in idl_set_intersec when length(idl_set->min) <= threshold ?
Second though about the second patch. I am under the impression that a filter '(&(!(comp_1)))' will create a idl_set.count=2 and idl_set.min=NULL. If it calls idl_set_intersect, it will return NULL where it should return ALLID. Am I missing something ?
Tested by patching the current 1.3.6 branch. The "notes=U" is now correclty added to the ranged non-indexed search. The filter used in 1.3.6 with patch and without patch takes the same amount of time (~90s for unindexes search): [28/Jun/2017:14:26:02.735701306 +0200] conn=1 op=3 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(|(modifyTimestamp>=20170614075000.0Z)(|(nsUniqueId=5ff1b313-1dd211b2-800db072-06120000)(nsUniqueId=90372806-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=90372808-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=9037280c-1dd211b2-80acdab2-9b8b0000)(nsUniqueId=b3fa6e04-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb402-1dd111b2-80addab2-9b8b0000)(nsUniqueId=d7bdb404-1dd111b2-80addab2-9b8b0000)(nsUniqueId=fb80fa09-1dd111b2-80addab2-9b8b0000)(nsUniqueId=f02ece05-1dd111b2-80afdab2-9b8b0000)(nsUniqueId=aa750425-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa750427-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=aa75042f-1dd111b2-80b5dab2-9b8b0000)(nsUniqueId=a4cbee07-1dd111b2-80b6dab2-9b8b0000)(nsUniqueId=1015c002-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c06-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c0d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=579c4c23-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f920d-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=7b5f9212-1dd211b2-80b6dab2-9b8b0000)(nsUniqueId=2e2ff003-1dd211b2-80b7dab2-9b8b0000)(nsUniqueId=4c4a2007-1dd..." attrs=ALL [28/Jun/2017:14:27:39.377178237 +0200] conn=1 op=3 RESULT err=0 tag=101 nentries=543 etime=96.645000 notes=U
After some operations the server crashes with filter "(&(entryusn>=178)(&(!(ou=))(departmentNumber=)))". I can build a debug version and get threads from this crash if you need
I think it would have idl_set.count == 1 and return head, which is also NULL and not correct
I noticed that as well, and we could follow your suggestion of handling it in intersec() or take advantage of knowing that all filter components matched and skip filter evaluation if possible
Stacktrace is attached. <img alt="stacktrace.28-Jun-2017_14h56m49s.txt" src="/389-ds-base/issue/raw/c524ec55865ab8167fab833acbb55195df297d0a028057d3c74964c938f75bc4-stacktrace.28-Jun-2017_14h56m49s.txt" />
@pj101 Is that with both patches? Or just the first one?
@tbordaz, @lkrispen I did remove the filter test threshold, and didn't see a great change in performance. Without it we already post faster intersections than previously. I can add it back to see if it helps or not. It can go in the same condition as checking for if idl->b_nids == 0, because it's almost the same logic then.
With the filter r '(&(!(comp_1)))'
DS actually internally expands this to '(|objectClass=referral)(&(!(comp_1))))' Which makes the answer more fun ;)
So we would get the outer |() creates an idl_set, and this has into it placed an empty idl, and the result of (&(!(comp_1)))
(&(!(comp_1))) one of two things happens here. Either the filter optimiser reduces it to:
!(comp_1) because ! is a special case of &.
Or we just get the &().
In the first case, we would indeed create an idl_set, and insert the result of !().
!() is treated as an and case. We create the next idl_set here. Because comp_1 is the first filter, we insert ALLIDS to the set, then we add the result of comp_1 as eq. Because we have ALLIDS and the comp_1, we have count == 2, so we then skip the intersection, and execute the complement.
We then recurse back out handing the idl back at each step.
I hope that helps :) if you want I can add this as a test case to the patch.
First patch only for notes=U on unindexed range searches.
commit 0548490 To ssh://git@pagure.io/389-ds-base.git 19b0f43..0548490 master -> master
@firstyear The stacktrace of segmentation fault (in idl_set.c:92) and the performance test i've made are with both patches of course.
hi Andrej, I tried to reproduce the crash using the same filter pattern, but failed, it will be data dependent, Could you please provide additional Info: - what indexes are defined for entryusn, ou, departmentnumber - what is the result of the individual searches --(entryusn>=178) -- (&(entryusn>=178)) -- (ou=) -- (!(ou=)) -- (&(!(ou=))) -- (departmentNumber=)
and maybe you can enable trace logging for the search that crashes the server
Interestingly enough, the segfault does NOT happen when you make a single research with this filter. Looks like it's a whole search sequence that produces this result. The search with this filter after server restart works without problem. It crashes in our script using multiple searches. I've managed to crash with other filters. Usually they all have a component entryusn>=smth. The most simple filter that i've found to produce segfault is as simple as (entryusn>=189). So i think it has nothing to do with other filter components but it has something to do with several sequential searches or timing. The stacktrace for (entryusn>=189) segfault is attached to the message.
The current index on entryusn is the default one: cn=entryusn,cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config nsSystemIndex: true nsIndexType: eq nsMatchingRule: integerOrderingMatch
entryusn index file dbscan -r -f /Local/dirsrv/var/lib/dirsrv/slapd-model/db/userRoot/entryusn.db | tail -n 10 =172 8726 =173 8960 =174 9020 =175 6597 24283 =190 5794
There is nothing in ldap error log. I will enable trace logging as you advice and see whether i can reproduce the segfault.
<img alt="stacktrace-simple-filter-entryusn-larger-than-smth.txt" src="/389-ds-base/issue/raw/ccab7e004f14147d117ffd1d3611ef30ff7763630e085d480a75e386dab19839-stacktrace-simple-filter-entryusn-larger-than-smth.txt" />
I have put 16384+1+4+32+65536 corresponding to Trace function calls. Logs a message when the server enters and exits a function Heavy trace output debugging Search filter processing. Server plug-in debugging. It writes an entry to the log file when a server plug-in calls slapi-log-error
The segfault on (entryusn>=192) is reproducible (after a certain search sequence only). Here is the full access log after server start and up to the crash (all with heavy debug tracingso search times are huge):
[29/Jun/2017:12:29:13.571100296 +0200] conn=1 fd=256 slot=256 connection from local to /Local/dirsrv/var/run/slapd-model.socket [29/Jun/2017:12:29:13.604151613 +0200] conn=1 AUTOBIND dn="cn=X LDAP Root" [29/Jun/2017:12:29:13.604912483 +0200] conn=1 op=0 BIND dn="cn=X LDAP Root" method=sasl version=3 mech=EXTERNAL [29/Jun/2017:12:29:13.610210401 +0200] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.038000 dn="cn=X LDAP Root" [29/Jun/2017:12:29:13.618050640 +0200] conn=1 op=1 MOD dn="cn=config" [29/Jun/2017:12:29:13.890467523 +0200] conn=1 op=1 RESULT err=0 tag=103 nentries=0 etime=0.280000 [29/Jun/2017:12:29:13.892826256 +0200] conn=1 op=-1 fd=256 closed - B1 [29/Jun/2017:12:29:32.679421304 +0200] conn=2 fd=256 slot=256 connection from local to /Local/dirsrv/var/run/slapd-model.socket [29/Jun/2017:12:29:32.723026450 +0200] conn=2 AUTOBIND dn="cn=X LDAP Root" [29/Jun/2017:12:29:32.723787286 +0200] conn=2 op=0 BIND dn="cn=X LDAP Root" method=sasl version=3 mech=EXTERNAL [29/Jun/2017:12:29:32.729059067 +0200] conn=2 op=0 RESULT err=0 tag=97 nentries=0 etime=0.049000 dn="cn=X LDAP Root" [29/Jun/2017:12:29:32.744586910 +0200] conn=2 op=1 SRCH base="" scope=0 filter="(objectClass=*)" attrs="lastusn;userroot" [29/Jun/2017:12:29:33.048490518 +0200] conn=2 op=1 RESULT err=0 tag=101 nentries=1 etime=0.316000 [29/Jun/2017:12:29:33.066071956 +0200] conn=2 op=2 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(&(objectClass=posixAccount)(objectClass=inetOrgPerson)(objectClass=X-Misc)(x-expirationdate;enseignement>=20170629102933Z))" attrs="1.1" [29/Jun/2017:12:31:51.372440352 +0200] conn=2 op=2 RESULT err=0 tag=101 nentries=3201 etime=138.323000 notes=U [29/Jun/2017:12:31:51.391328442 +0200] conn=2 op=3 SRCH base="cn=Acces Salles Informatiques,ou=Salles Informatiques,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu" scope=0 filter="(objectClass=groupofUniqueNames)" attrs="uniqueMember" [29/Jun/2017:12:31:51.436711753 +0200] conn=2 op=3 RESULT err=0 tag=101 nentries=1 etime=0.063000 [29/Jun/2017:12:31:51.994377209 +0200] conn=2 op=4 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(&(objectClass=posixAccount)(objectClass=inetOrgPerson)(objectClass=X-Misc)(x-expirationdate;kelen>=20170629103151Z))" attrs="1.1" [29/Jun/2017:12:33:32.987795576 +0200] conn=2 op=4 RESULT err=0 tag=101 nentries=226 etime=101.526000 notes=U [29/Jun/2017:12:33:33.005545198 +0200] conn=2 op=5 SRCH base="cn=Acces Kelen,ou=Serveurs,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu" scope=0 filter="(objectClass=groupofUniqueNames)" attrs="uniqueMember" [29/Jun/2017:12:33:33.044165722 +0200] conn=2 op=5 RESULT err=0 tag=101 nentries=1 etime=0.050000 [29/Jun/2017:12:33:33.093124078 +0200] conn=2 op=6 SRCH base="ou=Utilisateurs,dc=id,dc=polytechnique,dc=edu" scope=2 filter="(entryusn>=192)" attrs="1.1"
the end of the error log is in the attached trace.log file. If you need the whole trace file (~30M) from server startup, i can send it in private since it contains some sensitive data.
The place where it happens is always the same - ldap/servers/slapd/back-ldbm/idl_set.c:92 92 idl->next = idl_set->head; while idl=0x0
<img alt="trace.log" src="/389-ds-base/issue/raw/1eb962f7bf1e52e9609e60653e9c5a54e2e8cd6f6b4921a1cea157f75f581cce-trace.log" /> <img alt="stacktrace-heavy-debug-tracing.txt" src="/389-ds-base/issue/raw/d2aa23a8e50f1fefd493b29cad653d06e0f2ff8fe52745aa82a610b94e6f723b-stacktrace-heavy-debug-tracing.txt" />
Here is the same debug trace for the same filter used just after the restart (trace-successful-search.txt). The main difference is here:
[29/Jun/2017:13:07:31.493475164 +0200] - DEBUG - idl_new_range_fetch - Bulk fetch buffer nids=1 [29/Jun/2017:13:07:31.494253948 +0200] - DEBUG - idl_new_range_fetch - =193 returns nids=1 [29/Jun/2017:13:07:31.495024241 +0200] - DEBUG - index_range_read_ext - idl->b_nids=1 [29/Jun/2017:13:07:31.495752023 +0200] - DEBUG - index_range_read_ext - idl->b_nmax=512 [29/Jun/2017:13:07:31.496503766 +0200] - DEBUG - index_range_read_ext - idl->b_ids[0]=5794 [29/Jun/2017:13:07:31.497257876 +0200] - DEBUG - index_range_read_ext - (entryusn,=) 1 candidates [29/Jun/2017:13:07:31.498011449 +0200] - DEBUG - range_candidates - <= 1 [29/Jun/2017:13:07:31.498771199 +0200] - DEBUG - ava_candidates - <= 1 [29/Jun/2017:13:07:31.499515519 +0200] - DEBUG - filter_candidates_ext - <= 1 [29/Jun/2017:13:07:31.500264499 +0200] - DEBUG - filter_candidates_ext - => [29/Jun/2017:13:07:31.501023053 +0200] - DEBUG - filter_candidates_ext - EQUALITY
while in segfault it was: [29/Jun/2017:12:33:33.124668693 +0200] - DEBUG - index_range_read_ext - (entryusn,=) 0 candidates [29/Jun/2017:12:33:33.125448078 +0200] - DEBUG - range_candidates - <= 0 [29/Jun/2017:12:33:33.126202858 +0200] - DEBUG - ava_candidates - <= 0 [29/Jun/2017:12:33:33.126958174 +0200] - DEBUG - filter_candidates_ext - <= 0
So it looks like if in one connection several search operations are done it perturbs somehow idl (sets it to 0x0)
<img alt="trace-successful-search.txt" src="/389-ds-base/issue/raw/65af0230cd6a9b56ba902352aa9e81a1fc867f6e18f3f235c6c46c6d75a82319-trace-successful-search.txt" />
looks like we have two issues here: - the crash - the crash dependent on previous operations
I can reproduce the simple crash. If I have an eq indexed attribute and do a range search (attr>=largestvalue+1) the range search returns NULL and idl_set_insert_idl (idl_set=0x902b7e0, idl=0x0) crashes.
But I do not see how to trigger it with range search which would return ids.
I think it may be the same issue - lastusn may be changed during the script execution (e.g. by writing lastLoginTime in user entries) and the search filter is changed dynamically depending on lastusn;userroot. If the attr>=largestvalue+1 segfault is fixed i can test the new patch.
@firstyear It is not only a question of performance, unfortunately it can also change behaviour. E.g in the lib389 test for ticket 48252 we do a search: (&(objectclass=nstombstone)(cn=test_user1))
cn is indexed, but for tombstone the tombstone entry is removed from this index. In the old version, since we have only one tombstone the filter lookup for cn was skipped and the tombstone found. With your patch the cn comp is empty and the intersection is then empty as well. Now I would say that this is not well designed test, but are we sure there are no other apps running into this ?
@pj101 I'll investigate the cause of the crash as soon as possible. Something that may help is if you do the build with --enable-asan, and repeat the test. It gives better crash outputs in some cases.
@lkrispen Well, the problem here is that we are relying on that behaviour. Lets say I did (&(|(cn=a)(cn=b).....(cn=z))(objectclass=nstombstone)). Now I won't get any entries because cn is indexded and the result set is larger than the FILTER_TEST threshold, so we'll get the wrong result anyway.
Certainly, we can add it back for the "small queries" to work and pass, but I think it's an incorrect solution to the issue. I think that if we have nstombstone in a query, we probably need to do an allids search if the behaviour today is to strip them from their related indexes. That would resolve this case and the large case I proposed, and it would have correct behaviour given the lack of indexing on tombstoned attrs. Perhaps that is a follow up patch to correct that behaviour?
After some more research and tests I can confirm that crash happens only with the search (entryUSN>=max_usn_value), the search sequence or history is irrelevant. Our script does that search because lastusn;userroot does not correspond to the real maximum value in entryUSN index:
ldapsearch -x -h ldap-model.polytechnique.fr -b "" -s base '(objectClass=*)' 'lastusn;userroot' dn: lastusn;userroot: 176
dbscan -r -f /Local/dirsrv/var/lib/dirsrv/slapd-model/db/userRoot/entryusn.db | tail -n 10 =173 8960 =174 9020 =175 6597 24283
What is interesting, after server restart the value of lastusn;userroot becomes 175, so after server restart the same script and search does not crash the server. That is the explication of the behavior described in previous comments.
So, at the end of the day, actually there are only two completely independent problems: 1. an easily reproducible range search (attr>=largestvalue+1) crash related to this patch 2. lastusn value at suffix level does not correspond to real last usn. i think it is related to the ticket i've made about duplicate USNs in index: https://pagure.io/389-ds-base/issue/49300
And by the way when i enable ASAN no core dumps are produced. I guess there are some environment variables that need to be tweaked somewhere...
@pj101 thanks for all this work and clarifying this. I was scared about an intedependence of operations which I could not explain. But we can easily reproduce the crash and will fix it.
And you are right, there are more issues with entryusn, but we should handle this in the other ticket.
@firstyear William, the crash is easily to reproduce and explain. Just do a search for "sn>=ZZZZ". it will call filter_candidates_ext() in list_candidates() and receives an empty list tmp == NULL since there are no entrioes matching the filter. Then idl_set_insert_idl is called wit a NULL idl and immediately tries to dereference it.
to fix it we either have to check idl in idl_set_insert_idl() or to avoid calling it if tmp is NULL
@lkrispen I've been able to reproduce this now, thanks! I've added it to my filter logic test and will make a fix.
The answer is that we need the range candidates function on an empty set to return an idl with b_nids of 0. Previously we used a NULL to represent empty set, but because of how the idl_set code works, we use a linked list from idl->next. If we get idl == NULL, we can't add it to the list as a result. I thought about making it handle this case, and setting an empty set present flag, but it seemed like a premature optimisation. As a result, I decided to fix the places that return NULL. Now NULL should be an error case, and an idl is always returned, even if it's empty.
Arguably, this is just an error of C the language, that often we confuse types like this. NULL means the absence of something, whereas here we have NULL is an empty set or an error, and idl can have b_nids 0 (making it eq to NULL), or b_nids > 0.
Instead we should be clear about our representations of data - if it's NULL, it means we are missing the idl because something unknown happened. Otherwise, in all othercaes we have an idl, and it can be between 0 -> ALLIDS candidates. From there it is much easier to process than adding more edge cases :)
I'll update this patch soon, thanks,
This resolves the issue reported, and improves the error case for debugging when we do find a NULL idl.
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/10d44616864b4238c5f0459bfb89fe0ed8321fd97eda779aa8602762b942ff1c-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
@lkrispen As another point, I think I should open a new ticket for the nsTombstone search issue, namely that when oc=nstombstone is present we need to do allids search. Do you agree?
no. We explicitely added an indexed tombstonecsn to avoid allids search in tombstone purging. allids for tombstone purging is a nogo
No more crashes with the new patch. No performance improvement of one unindexed ranged filter combined by "or" with many indexed equality subfilters (very first comment), search time does not change much from 90s, meaning same 5-time perf loss for this filter compared to 1.3.4,
maybe this has nothingto do with filters, but is an aci problem, could you run the searches as directory manager and compare results of 1.3.4 and 1.3.6 ?
It could be a side effect of the RFE #48275, which changed the way aci and filter evaluation was doen in OR filters
these searches are run as a user that has no limitations and has the same rights as directory manager (only nsUniqueId and modifyTimeSTamps are used in that filter anyway) aci: (targetattr ="*")(version 3.0;acl "Directory Administrators Group";allow (all) (groupdn = "ldap:///cn=LDAP administration,...);)
But i have run the same search with directory administrator user and you are right - the search time with directory administrator is approximately the same as for 1.3.4 (22s for 1.3.6 and 18.6s for 1.3.4). So i think you have correctly pinpointed the underlying mechanics - ACI evaluation in "or" filters with some unindexed subfilters.
<img alt="0001-quick-optimization-for-or-filter-with-many-component.patch" src="/389-ds-base/issue/raw/9700c98715eb0437af56fc0b2c3ca8bf34ffe350303bfe81f53762305bfc3714-0001-quick-optimization-for-or-filter-with-many-component.patch" />
I uploaded a quick and dirty patch to address the issue. The basic idea is to avoid the reavaluation of access to ab attribute if we already have done so. This patch does this for or filters, but only has a "history" length of 1 and does not handle evaluation in nested filters. To do it properly it would have to be done at the lowest level of the nesting and the history would have to be passed somehow
@lkrispen As another point, I think I should open a new ticket for the nsTombstone search issue, namely that when oc=nstombstone is present we need to do allids search. Do you agree? no. We explicitely added an indexed tombstonecsn to avoid allids search in tombstone purging. allids for tombstone purging is a nogo
Well, the behaviour you are talking about with the filter test threshold is a bug here. If we request (&(oc=nstombstone)(cn=bar)) we get a result, but if we do (&(oc=nstombstone)(|(cn=bar).....(cn=zar))) we'll get no result because we exceeded the threshold.
So we need to do something to correct this behaviour, at the moment we rely on this quirk and no one complains because we search for singular entries.
I imagine the search you are mentioning for purging is (&(objectClass=nstombstone)(tombstonecsn= ....)) which is why you don't want oc=nstombstone to cause allids.
Some options could be:
With regard to the patch you provided for the aci check:
(|(uid=....)(cn=....)(uid=....))
We only speed up uid= processing. So if someone re-arranged this:
(|(cn=....)(uid=....)(uid=....))
Even though there are more uid types we should cache the access to, cn= is the only one we speed up.
Finally, this may cause an aci to not be evaled if we have:
ou=People aci: allow .... uid=a,ou=People,.... aci: deny uid=b,ou=People,.....
If we hit uid=a first, we would get a deny, then we deny uid=b incorrectly. If we hit uid=b first, we would allow access to a because we don't re-eval the aci on it.
@firstyear Hi, I think you miss a few points here. - I addressed the limitations of this patch, it addresses only eq in or filter lists, but as I said to do it properly it would have been done at the level where the real access test on the attribute is done, butthis would mean that the "history" of evaluated attrs has to be passed to all the filter eval functions, too much effort to verify if it addresses the currrent problem - the history length is only 1, but that could be changed - the optimization is effective for any consecutive sequence of the same attribute. In (|(cn=....)(uid=....)(uid=....)) it would first evaluate access to cn and store cn in has_access, next it detects that uid != cn and evaluates uid and stores uid to has_access, but invreasing the history length could also cover (|(uid=....)(cn=....)(uid=....)) - the evaluation is for filter access only and it is done for each entry again, the result of entry a doe not affect entry b
i have tested the patch from comment https://pagure.io/389-ds-base/issue/49290#comment-447205, it brings back the performance of the search on par with 1.3.4 (20s)
@pj101 thanks for testing. if I understood correctly indexing modifyTimestamp for equality gives even better performance, so we have some time to do this optimization properly, not only adhoc for equality and or filter lists
@lkrispen It looks like with the nsTombstone we only index objectClass=nsTombstone, nsUniqueId, nscpEntryDN, nsTombstoneCSN. I think that if we add entryusn to this list, then we can leave the current behaviour in place in the idl_set patch. This would mean:
(&(objectclass=nsTombstone)(entryusn<=%s)) (&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone)) (&(%s<=%s)(objectclass=nsTombstone))", SLAPI_ATTR_TOMBSTONE_CSN
All work with the tombstones. It would also mean other queries still would not pick up the tombstones during a search op. This seems like the simplest solution.
I think the only side-effect might be that in an entryusn search you would now see nsTombstones in the list of updates, but is that a bad change? really, these are "part of the updates" that are occuring. Without the aci to see them anyway, they would be filtered.
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/c2cbd7233608a58d9a40e67a819d4ba70a1666b35fe3f873fae8cdff6804e9f4-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
@lkrispen This adds the nstombstones to entryusn index always, and alters a single test case to assert the correct behaviour. This passes all the test suites including replication and the ticket tests ds/dirsrvtests/tests/tickets/ticket48005_test.py ds/dirsrvtests/tests/tickets/ticket48252_test.py which both check usn as well as the usn tombstone clean task.
Should we open a new ticket to improve aci evaluation performance for ALLIDS searches then?
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/300a04c2b4387e311b309f4e41ace8457df8fbe1593ee8bd49d4b3eb7bc394ce-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
Tweak to the patch to loosely sort incoming IDLs - this should now improve the search case that @pj101 has with the | filter.
@lkrispen Yes, indexing modifyTimestamp resolves the original problem resulting in better performance that full table scans. There is no hurry, i just wanted to dig out what happened between 1.3.4 and 1.3..6 to get that performance hit. Now that we know it, we are at least aware of it and you can take your time.
@firstyear is entryUSN now an obligatory attribute to count on its presence and indexing? I always activated it explicitly using the USN plugin in server configuration scripts.
@lkrispen I may have a follow up patch to this which improves union and intersect behaviour again by reducing the number of allocations. Should we make that a second patch? Or are you happy for me to revise this (yet again)? :)
one patch would be easier to review
I have tested this patch - it has the same timing for initial filter with non-indexed modifyTimeStamp as non-patched 1.3.6 (90s)
Metadata Update from @mreynolds: - Issue set to the milestone: 1.3.7.0 (was: 1.3.6.0)
@pj101 My patch affects the indexed modifyTimestamp case - not the unindexed case.
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/1862e263d97d81a8acb767b114d6170f0d5515b22a2efa7d8d29764189525b77-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
Final version of the idl_set patch. This adds k-way union and intersection, as well as some basic case shortcuts. In single thread searches I have seen up to 10% reduction in etime, but in multi-thread load tests (ldclt) I see roughly 2% - 4% depending on query type.
An important aspect of this patch is that it makes complex filters more efficient in handling. This means filters with at least 3 & conditions, or 3 | conditions. For filters that use 2 or less, this patch does not affect performance.
For example,
(&(objectClass=person)(|(uid=test0000001) ....)) (&(objectClass=organizationalPerson)(objectClass=posixAccount)(objectClass=person)(|(uid=test0000001) ... ))
We tend to see filters like the latter emitted by software like SSSD quite a lot so improvements to this handling is going to help us across the board.
Another improvement of this is that filter operator order now does not affect performance. Previously these queries had a significant difference in processing time:
(|(modifyTimestamp>=X)(uid=X)(uid=....)) (|(uid=X)(uid=....)(modifyTimestamp>=X))
The second query would outperform the former, but with the k-way algorithm this is not the case.
Finally, in the case you have an allids in an | or empty in &, we shortcut earlier without doing any list processing. This makes these operations faster because we don't do work and throw it away.
In the future because of the design of this there are now other locations to improve.
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to review
idl_compare is defined and implemented but not used
Yep, I was using it to assert in tests that my new code gave the same answers. Ill remove that and add filter test threshold back in.
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/dbfbae4c820b481c57054f8306e0adfb9d1c0112cba20385213df7c364e73b5e-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
@lkrispen @tbordaz This adds the filter test threshold back. It passes basic and filter tests. I expanded the filter tests to test both test_threshold, 2-way and k-way paths.
about allids and not: if we have a filter "(&(objectclass=*)(!(objectclass=organizationalunit)))"
and many entries, the first component is allids, and assume there is only one org unit. we enter idl_set_intersect and detect that the result list is allids. Now we go to handle the not part. we have
(gdb) p result_list $2 = {b_nmax = 0, b_nids = 10005, next = 0x0, itr = 0, b_ids = {0}} (gdb) p idl_set $1 = {count = 0, allids = 1, min_size = 0, total_size = 0, head = 0x0, complement_head = 0x7f30580017b0} and we enter: if (idl_set->complement_head != NULL && result_list->b_nids > 0) and in idl_notin() run into this case: if ( ALLIDS( a ) ) { / Not convinced that this code is really worth it /
(gdb) p result_list $2 = {b_nmax = 0, b_nids = 10005, next = 0x0, itr = 0, b_ids = {0}} (gdb) p idl_set $1 = {count = 0, allids = 1, min_size = 0, total_size = 0, head = 0x0, complement_head = 0x7f30580017b0} and we enter: if (idl_set->complement_head != NULL && result_list->b_nids > 0) and in idl_notin() run into this case:
if ( ALLIDS( a ) ) { / Not convinced that this code is really worth it /
I am convinced that this code is not worth it (allids - idl) should be allids
about the threshold: I don't like the ifdefs and think we know that FILTER_TEST_THRESHOLD is defined, the code should always work for
idl_set->minimum->b_nids <= FILTER_TEST_THRESHOLD if FILTER_TEST_TRESHOLD == 0 or > 0, I can even imagine we once want to make it configurable and just use the defined threshold as default. I would suggest somthing like - extend idlset to have a filed threshold - in idl_set_create do #ifdef FILTER_TEST_THRESHOLD { threshold = FILTER_TEST_THRESHOLD} - compare idl_set->minimum->b_nids <= idl_set->threshold
I'm happy to fix the filter_test_threshold if def, but the allids with not is a seperate issue imo - we can't keep pilling in improvements into this patch else we'll never land it.
Let's get this in it's current form in, then we can fix the allids not case. It's already the case this code works as described, we are only looking because I happen to be working nearby it :) It's great we are looking at this issue, but lets make it seperate from the idl_set code.
<img alt="0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" src="/389-ds-base/issue/raw/2184c9ff93dfd3faae8167d29b6e26204e6f5c53249304a0f5563dbf34788758-0001-Ticket-49290-improve-idl-handling-in-complex-searche.patch" />
I'm happy to fix the filter_test_threshold if def,
thanks
but the allids with not is a seperate issue imo - we can't keep pilling in improvements into this patch else we'll never land it.
ok, I was under the impression that your patch had moved a shortcut out of the current implementation, but I was wrong Your handling of the NOT part is already better than before by handling them after intersecting the other components. I had thought of a simple check before gouning into the not handling: if (idl_is_allids(result_list) return result_list;
but we can do this is a separate ticket, so I'm ack with this
commit 30c4f30 To ssh://git@pagure.io/389-ds-base.git 2339b25..30c4f30 master -> master
Thanks mate! Really appreciate your thorough review of this :)
@lkrispen In this patch that still leaves the ACI issue at hand. How do we want to progress on this?
I think we need two follow up tickets: one for the NOT case and one for the ACI stuff
@lkrispen Did we produce the ACI issue ticket from this?
Metadata Update from @firstyear: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field version adjusted to None
created #49456 for the aci issue,
do we still need something for the NOT case ?
Metadata Update from @lkrispen: - Custom field reviewstatus adjusted to ack (was: review)
@lkrispen For now lets focus on OR, and then check NOT as well? maybe open a ticket to check the NOT aci handling?
there is no NOT aci problem, the perf problem for aci is in AND/OR with filter components using the same attribute repetedly - and I have created the ticket for this.
I was unsure if we need something for NOT in the idl handling, and I think we should have another ticket for that. And close this one as fixed.
Yep, sounds good to me. I think there could be improvements in NOT handling, but I'll need to investigate. I'll open that ticket up.
Closing this issue,
Metadata Update from @firstyear: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2349
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Log in to comment on this ticket.