#50066 Missing index for filter (objectClass=*)
Closed: wontfix 5 years ago by firstyear. Opened 5 years ago by cheimes.

Issue Description

By analysing the access log with logconf.pl I noticed several missing indexes for LDAP searches with filter (objectClass=*). This filter is commonly used by python-ldap to signal a wildcard query. It looks like the problem is caused by the index definition in cn=objectclass,cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config and cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config. The default index for objectclass is just nsIndexType: eq. The eq filter is not used for * filters. A * query uses a presence index, but there is no pres index for objectClass.

Package Version and Platform

389-ds-base-1.4.0.16-1

Steps to reproduce

  1. perform an ldapsearch with filter '(objectClass=*)'.
  2. wait a bit or restart 389-DS to flush access log to disk
  3. logconv.pl -U /var/log/dirsrv/*/access

Actual results

  -  Unindexed Filters:   Filter:   (objectclass=*) (occurrences 11)
                          - Bind DN:  cn=directory manager (binds  11)

Expected results

No unindexed filters


This is correct, because every entry has an objectclass, so having a index for pres does not make any sense. It would index every single entry in the database, which is not efficient. In such cases it is faster to just scan the main database(id2entry) then to read every single ID in the database from an index, then scan id2entry to get the entry (it's double the work and makes the index pointless).

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

5 years ago

Thanks for the clarification. Would it make sense to special case '(objectClass=*)' so that neither nsslapd-require-index nor logconf.pl -U consider searches with filter '(objectClass=*)' as unindexes searches?

Thanks for the clarification. Would it make sense to special case '(objectClass=)' so that neither nsslapd-require-index nor logconf.pl -U consider searches with filter '(objectClass=)' as unindexes searches?

Not really. nsslapd-require-index should not be "worked around". If this is set to "on", then it should block "full" unindexed searches. logconv.pl just reports what is in the log, again if the server reports an unindexed search in the log, then the tool should report it too.

Now in logconv.pl (and in the access logs) there are two types of unindexed searches, and these are reported as:

notes=U --> unindexed component (not block by nsslapd-require-index - they can still be bad, but not always).

notes=A --> Full database scan. Very serious, causes 100% CPU usage and grinds the server to a halt. These are blocked by nsslapd-require-index.

The reason notes=U searches are not always blocked by nsslapd-require-index is because the idscanlimit is was not exceeded for that search. Really it means it's a small database.

Here is example of a small db:

[03/Dec/2018:12:01:41.589615841 -0500] conn=1 op=0 BIND dn="cn=dm" method=128 version=3
[03/Dec/2018:12:01:41.672498657 -0500] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0083059536 dn="cn=dm"
[03/Dec/2018:12:01:41.672780132 -0500] conn=1 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[03/Dec/2018:12:01:41.675638357 -0500] conn=1 op=1 RESULT err=0 tag=101 nentries=9 etime=0.0002999759 notes=U
[03/Dec/2018:12:01:41.923239923 -0500] conn=1 op=2 UNBIND

The search was allowed, and we see notes=U, but the result set is small (9 entries), which is less than the default idscanllimit of 4000. Now if I add more users (4000+), then things change:

[03/Dec/2018:12:10:42.214207951 -0500] conn=3 op=0 BIND dn="cn=dm" method=128 version=3
[03/Dec/2018:12:10:42.295424471 -0500] conn=3 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0081279378 dn="cn=dm"
[03/Dec/2018:12:10:42.295570640 -0500] conn=3 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[03/Dec/2018:12:10:42.295804349 -0500] conn=3 op=1 RESULT err=53 tag=101 nentries=0 etime=0.0000272935 notes=A
[03/Dec/2018:12:10:42.295886924 -0500] conn=3 op=2 UNBIND

First we can see the operation was rejected (err=53 - unwilling to perform), and we now see notes=A instead of notes=U

So when looking at logconv.pl output, you first want to address the notes=A search (Full unindexed searches). Then you can start to look at the notes=U searches. Typically these are compound search filters, and notes=U means one (or more) of the filter components is unindexed. The reason it's not a notes=A is because the server was able to generate the result set without having to scan the entire database. So it's not super serious as long as the etime is short, but it could be improved by indexing the attribute used in the "bad" component(s).

This reminds me of a recent issue I helped on:

https://pagure.io/freeipa/issue/7786

In this case the searches were not fully unindexed, but adding the missing indexes did improve performance.

Sorry that was a lot of info, but I hope it helps clear things up on how this all works.

notes=U --> unindexed component (not block by nsslapd-require-index - they can still be bad, but not always).

This is the bug then, that we give a false error message. We should indicate that objectclass= as a pres is not unindexed when we set the flags. That way a search like ((uid=william)(objectclass=)) will not show notes=U.

nsslapd-require-index is to prevent a full db scan, if user or system indexes manage to limit the candidate list to less than idscanlimit then the request is not blocked and it looks normal to me.
I do not read 'notes=U' as an error message but rather a warning that the user filter is unindexed, but the server manages to process it without breaking idscanlimit. So this kind of filter should be improved because it can become more expensive.

@tbordaz It's a problem because we now normalise "notes=U is fine with objectClass=" but if the filter is '(&(objectClass=)(uid=x)(cn=y))', notes=U appears, but it masks the lack of index on CN perhaps. And of course, uid or cn filters could be broad and slow (we removed filter optimising anyway...).

As a result, the issue is we mislead admins to a false sense of "notes=U is fine if objectClass=*" is there, but it's hiding other issues. Our error message is clearly reporting a false error, which has led to a bug report like this one.

Therfore the error is that we are communicating incorrectly to our users. We should only report notes=U if there truly is an unindexed component.

Hi @firstyear, sorry for my slow brain and poor english (you know how bad my english is ;) )... but I am missing your point :(

First of all, I agree that notes=U is an alarming message. It means "we managed to index your search but be aware the provided filter is not indexed and may later lead to allid"

In the problematic example '(&(objectClass=)(uid=x)(cn=y))' if it returns notes=U. That means objectclass is not indexed in presence and 'uid' and* 'cn' are not indexed in equality. So IIUC it is not hiding a lack of index.

Also I am not sure to understand what is your request or change you would like to see.

I "think" @firstyear wants an RFE to do something like this:

notes=U(<attrs that are not indexed>)

notes=U(objectclass, uid)

notes=A(<unindexed attrs>)

Then you have more info to further troubleshoot the filter. I know personally I think something like this would be very useful. I don't care about the syntax of the message but if we can see what attributes in a filter that are triggering the "notes=" it be very beneficial to everyone (customers, support, dev,...)

thanks @mreynolds, I agree it is a nice RFE. Note it is not relate to the original ticket.
Regarding that RFE, it could also be toggle with an appropriate access-log-level.

thanks @mreynolds, I agree it is a nice RFE. Note it is not relate to the original ticket.
Regarding that RFE, it could also be toggle with an appropriate access-log-level.

Opened RFE:

https://pagure.io/389-ds-base/issue/50133

Back to this issue. This can be closed, right? My comments here should be to explain everything:

https://pagure.io/389-ds-base/issue/50066#comment-544130

Other concerns can be addressed in the RFE ticket? Thoughts?

I thing this is a great idea, and I've commented on your new ticket. Additionally, as a note, once we fix the filter optimisation again we can talk about the need for idlistscanlimit :) go ahead and close this now I think.

Yes, please close the ticket. I can't close it myself.

#50133 and @firstyear 's comment https://pagure.io/389-ds-base/issue/50133#comment-548658 cover all my concerns. Thanks!

Metadata Update from @firstyear:
- Issue close_status updated to: invalid
- Issue status updated to: Closed (was: Open)

5 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/3125

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: invalid)

3 years ago

Login to comment on this ticket.

Metadata