#47504 idlistscanlimit per index/type/value
Closed: wontfix None Opened 8 years ago by rmeggins.

With very large databases, some queries go through a lot of work to build huge ID lists for filter components with many matching IDs. For example, a search for (&(objectclass=inetorgperson)(uid=foo)) may build a huge idlist for objectclass=inetorgperson only to throw it away to intersect it with uid=foo. In these cases, it would be useful to be able to tell the indexing code to use a different idlistscanlimit for certain indexes, or use no idlist at all. In the above case, it would be useful to tell the indexing code to skip building an idlist for objectclass=inetorgperson, but still use the default idlistscanlimit for other objectclass searches (e.g. objectclass=groupOfNames).

This would also help in https://fedorahosted.org/389/ticket/47474 - if there are several million IDs for each of the objectclass= filter components, being able to skip id list generation for the objectclass values would make that query very fast.

We can't reuse nsslapd-idlistscanlimit, so perhaps a new attribute

dn: cn=attrname,cn=index,...
objectclass: nsIndex
nsIndexIDSize: NNNN[:type][:eqvalue:eqvalue:...]

Where NNNN is the max ID list size (or 0 for no list at all)
type is the type of index (sub, pres, eq)
eqvalue are for equality indexes - these are the values to which the max ID list size applies

So in the case of ticket/47474, something like

dn: cn=objectclass,...
objectclass: nsIndex
nsIndexType: eq
nsIndexIDSize: 0:eq:organizationalPerson:inetOrgPerson:organization:organizationalUnit:groupOf
Names:groupOfUniqueNames:group

Would effectively disable id list generation for the objectclass values listed.

Note that this will apply to all queries for any of the objectclass values, not just their use in conjunction with this particular search filter.


This looks like a good flexible approach that would be useful for many different situations.

A couple of questions...
1. Is "nsIndexIDListScanLimit" perpendicular to these config params?
nsslapd-pagedlookthroughlimit: 0
nsslapd-pagedidlistscanlimit: 0
nsslapd-rangelookthroughlimit: 5000
2. I think the answer is yes :), but if "nsIndexIDListScanLimit" is set, is this original config param ignored?
nsslapd-idlistscanlimit: 4000
3. This is a request...
It'd be nice to check if values is NULL or values do not contain '=' (possibility of ptr == NULL)? If no such chance, could you put the comment (or PR_ASSERT?)
206 attr_index_parse_idlistsize_values(Slapi_Attr attr, struct index_idlistsizeinfo idlinfo, char values, const char strval, char returntext)
207 {
...
210 char
ptr = PL_strchr(values, '=');
...
220 ++ptr;

356 attr_index_parse_idlistsize_limit(char *ptr, struct index_idlistsizeinfo *idlinfo, char *returntext) 
357 { 
...
361         ptr++;

380 attr_index_parse_idlistsize_type(char *ptr, struct attrinfo *ai, struct index_idlistsizeinfo *idlinfo, const char *val, const char *strval, char *returntext) 
381 { 
...
389         do { 
390                 ++ptr;

458 attr_index_parse_idlistsize_flags(char *ptr, struct index_idlistsizeinfo *idlinfo, const char *val, const char *strval, char *returntext) 
459 { 
...
464         do { 
465                 ++ptr;

Replying to [comment:5 nhosoi]:

A couple of questions...
1. Is "nsIndexIDListScanLimit" perpendicular to these config params?
nsslapd-pagedlookthroughlimit: 0
nsslapd-pagedidlistscanlimit: 0
nsslapd-rangelookthroughlimit: 5000

I did not implement any special support for ranges. I will need to do that, and matching rules. But otherwise, yes, the new code will override nsslapd-pagedidlistscanlimit if set.

  1. I think the answer is yes :), but if "nsIndexIDListScanLimit" is set, is this original config param ignored?
    nsslapd-idlistscanlimit: 4000

Yes. If there is a matching request, the matching request will override this value. Otherwise, if there is no matching request, the default value of nsslapd-idlistscanlimit/nsslapd-pagedidlistscanlimit will be used.

  1. This is a request...
    It'd be nice to check if values is NULL or values do not contain '=' (possibility of ptr == NULL)? If no such chance, could you put the comment (or PR_ASSERT?)

Ok. At this point in the code, ptr should always be set. So I'll add PR_ASSERT.

206 attr_index_parse_idlistsize_values(Slapi_Attr attr, struct index_idlistsizeinfo idlinfo, char values, const char strval, char returntext)
207 {
...
210 char
ptr = PL_strchr(values, '=');
...
220 ++ptr;

356 attr_index_parse_idlistsize_limit(char ptr, struct index_idlistsizeinfo idlinfo, char *returntext)
357 {
...
361 ptr++;

380 attr_index_parse_idlistsize_type(char ptr, struct attrinfo ai, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char returntext)
381 {
...
389 do {
390 ++ptr;

458 attr_index_parse_idlistsize_flags(char ptr, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char *returntext)
459 {
...
464 do {
465 ++ptr;

Thanks for the answers, Rich. Ack.

changes since the previous patch
newdiffs

The design document for this functionality is located here:

http://port389.org/wiki/Design/Fine_Grained_ID_List_Size

To ssh://git.fedorahosted.org/git/389/ds.git
5005db5..b5ad052 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit b5ad052
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Sep 16 09:49:14 2013 -0600
e61009e..3ea8e58 389-ds-base-1.3.0 -> 389-ds-base-1.3.0
commit 3ea8e58
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Sep 16 09:49:14 2013 -0600
c244a9b..b348886 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit b348886
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Sep 16 09:49:14 2013 -0600
385b5dc..824b301 master -> master
commit 824b301
Author: Rich Megginson rmeggins@redhat.com
Date: Mon Sep 16 09:49:14 2013 -0600

Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1011539 (''Red Hat Enterprise Linux 7'')

To ssh://git.fedorahosted.org/git/389/ds.git
b5ad052..d83311a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit d83311a
Author: Rich Megginson rmeggins@redhat.com
Date: Tue Sep 24 08:18:57 2013 -0600
3ea8e58..527c3e4 389-ds-base-1.3.0 -> 389-ds-base-1.3.0
commit 527c3e4
Author: Rich Megginson rmeggins@redhat.com
Date: Tue Sep 24 08:18:57 2013 -0600
b348886..e95d7d6 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit e95d7d6
Author: Rich Megginson rmeggins@redhat.com
Date: Tue Sep 24 08:18:57 2013 -0600
824b301..36f506d master -> master
commit 36f506d
Author: Rich Megginson rmeggins@redhat.com
Date: Tue Sep 24 08:18:57 2013 -0600

To ssh://git.fedorahosted.org/git/389/ds.git
d83311a..373e36a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 373e36a
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 25 08:51:12 2013 -0600
527c3e4..c96eaa0 389-ds-base-1.3.0 -> 389-ds-base-1.3.0
commit c96eaa0
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 25 08:51:12 2013 -0600
e95d7d6..e5405e6 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit e5405e6
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 25 08:51:12 2013 -0600
d9f25b7..058d01d master -> master
commit 058d01d
Author: Rich Megginson rmeggins@redhat.com
Date: Wed Sep 25 08:51:12 2013 -0600

Metadata Update from @nkinder:
- Issue set to the milestone: 1.3.2 - 09/13 (September)

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/841

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

a year ago

Login to comment on this ticket.

Metadata