Issue #50066: Missing index for filter (objectClass=*) - 389-ds-base

389-ds-base

#50066 Missing index for filter (objectClass=*)

Closed: wontfix 5 years ago by firstyear. Opened 5 years ago by cheimes.

Issue Description

By analysing the access log with logconf.pl I noticed several missing indexes for LDAP searches with filter (objectClass=*). This filter is commonly used by python-ldap to signal a wildcard query. It looks like the problem is caused by the index definition in cn=objectclass,cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config and cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config. The default index for objectclass is just nsIndexType: eq. The eq filter is not used for * filters. A * query uses a presence index, but there is no pres index for objectClass.

Package Version and Platform

389-ds-base-1.4.0.16-1

Steps to reproduce

perform an ldapsearch with filter '(objectClass=*)'.
wait a bit or restart 389-DS to flush access log to disk
logconv.pl -U /var/log/dirsrv/*/access

Actual results

  -  Unindexed Filters:   Filter:   (objectclass=*) (occurrences 11)
                          - Bind DN:  cn=directory manager (binds  11)

Expected results

No unindexed filters

mreynolds commented 5 years ago

This is correct, because every entry has an objectclass, so having a index for pres does not make any sense. It would index every single entry in the database, which is not efficient. In such cases it is faster to just scan the main database(id2entry) then to read every single ID in the database from an index, then scan id2entry to get the entry (it's double the work and makes the index pointless).

Metadata Update from @mreynolds:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Custom field type adjusted to None
- Custom field version adjusted to None

5 years ago

cheimes commented 5 years ago

Thanks for the clarification. Would it make sense to special case '(objectClass=*)' so that neither nsslapd-require-index nor logconf.pl -U consider searches with filter '(objectClass=*)' as unindexes searches?

mreynolds commented 5 years ago

Thanks for the clarification. Would it make sense to special case '(objectClass=)' so that neither nsslapd-require-index nor logconf.pl -U consider searches with filter '(objectClass=)' as unindexes searches?

Not really. nsslapd-require-index should not be "worked around". If this is set to "on", then it should block "full" unindexed searches. logconv.pl just reports what is in the log, again if the server reports an unindexed search in the log, then the tool should report it too.

Now in logconv.pl (and in the access logs) there are two types of unindexed searches, and these are reported as:

notes=U --> unindexed component (not block by nsslapd-require-index - they can still be bad, but not always).

notes=A --> Full database scan. Very serious, causes 100% CPU usage and grinds the server to a halt. These are blocked by nsslapd-require-index.

The reason notes=U searches are not always blocked by nsslapd-require-index is because the idscanlimit is was not exceeded for that search. Really it means it's a small database.

Here is example of a small db:

[03/Dec/2018:12:01:41.589615841 -0500] conn=1 op=0 BIND dn="cn=dm" method=128 version=3
[03/Dec/2018:12:01:41.672498657 -0500] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0083059536 dn="cn=dm"
[03/Dec/2018:12:01:41.672780132 -0500] conn=1 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[03/Dec/2018:12:01:41.675638357 -0500] conn=1 op=1 RESULT err=0 tag=101 nentries=9 etime=0.0002999759 notes=U
[03/Dec/2018:12:01:41.923239923 -0500] conn=1 op=2 UNBIND

The search was allowed, and we see notes=U, but the result set is small (9 entries), which is less than the default idscanllimit of 4000. Now if I add more users (4000+), then things change:

[03/Dec/2018:12:10:42.214207951 -0500] conn=3 op=0 BIND dn="cn=dm" method=128 version=3
[03/Dec/2018:12:10:42.295424471 -0500] conn=3 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0081279378 dn="cn=dm"
[03/Dec/2018:12:10:42.295570640 -0500] conn=3 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[03/Dec/2018:12:10:42.295804349 -0500] conn=3 op=1 RESULT err=53 tag=101 nentries=0 etime=0.0000272935 notes=A
[03/Dec/2018:12:10:42.295886924 -0500] conn=3 op=2 UNBIND

First we can see the operation was rejected (err=53 - unwilling to perform), and we now see notes=A instead of notes=U

So when looking at logconv.pl output, you first want to address the notes=A search (Full unindexed searches). Then you can start to look at the notes=U searches. Typically these are compound search filters, and notes=U means one (or more) of the filter components is unindexed. The reason it's not a notes=A is because the server was able to generate the result set without having to scan the entire database. So it's not super serious as long as the etime is short, but it could be improved by indexing the attribute used in the "bad" component(s).

This reminds me of a recent issue I helped on:

https://pagure.io/freeipa/issue/7786

In this case the searches were not fully unindexed, but adding the missing indexes did improve performance.

Sorry that was a lot of info, but I hope it helps clear things up on how this all works.

mreynolds commented 5 years ago

@cheimes okay to close this?

firstyear commented 5 years ago

notes=U --> unindexed component (not block by nsslapd-require-index - they can still be bad, but not always).

This is the bug then, that we give a false error message. We should indicate that objectclass= as a pres is not unindexed when we set the flags. That way a search like ((uid=william)(objectclass=)) will not show notes=U.

tbordaz commented 5 years ago

nsslapd-require-index is to prevent a full db scan, if user or system indexes manage to limit the candidate list to less than idscanlimit then the request is not blocked and it looks normal to me.
I do not read 'notes=U' as an error message but rather a warning that the user filter is unindexed, but the server manages to process it without breaking idscanlimit. So this kind of filter should be improved because it can become more expensive.

firstyear commented 5 years ago

@tbordaz It's a problem because we now normalise "notes=U is fine with objectClass=" but if the filter is '(&(objectClass=)(uid=x)(cn=y))', notes=U appears, but it masks the lack of index on CN perhaps. And of course, uid or cn filters could be broad and slow (we removed filter optimising anyway...).

As a result, the issue is we mislead admins to a false sense of "notes=U is fine if objectClass=*" is there, but it's hiding other issues. Our error message is clearly reporting a false error, which has led to a bug report like this one.

Therfore the error is that we are communicating incorrectly to our users. We should only report notes=U if there truly is an unindexed component.

tbordaz commented 5 years ago

Hi @firstyear, sorry for my slow brain and poor english (you know how bad my english is ;) )... but I am missing your point :(

First of all, I agree that notes=U is an alarming message. It means "we managed to index your search but be aware the provided filter is not indexed and may later lead to allid"

In the problematic example '(&(objectClass=)(uid=x)(cn=y))' if it returns notes=U. That means objectclass is not indexed in presence and 'uid' and* 'cn' are not indexed in equality. So IIUC it is not hiding a lack of index.

Also I am not sure to understand what is your request or change you would like to see.

mreynolds commented 5 years ago

I "think" @firstyear wants an RFE to do something like this:

notes=U(<attrs that are not indexed>)

notes=U(objectclass, uid)

notes=A(<unindexed attrs>)

Then you have more info to further troubleshoot the filter. I know personally I think something like this would be very useful. I don't care about the syntax of the message but if we can see what attributes in a filter that are triggering the "notes=" it be very beneficial to everyone (customers, support, dev,...)

tbordaz commented 5 years ago

thanks @mreynolds, I agree it is a nice RFE. Note it is not relate to the original ticket.
Regarding that RFE, it could also be toggle with an appropriate access-log-level.

mreynolds commented 5 years ago

thanks @mreynolds, I agree it is a nice RFE. Note it is not relate to the original ticket.
Regarding that RFE, it could also be toggle with an appropriate access-log-level.

Opened RFE:

https://pagure.io/389-ds-base/issue/50133

Back to this issue. This can be closed, right? My comments here should be to explain everything:

https://pagure.io/389-ds-base/issue/50066#comment-544130

Other concerns can be addressed in the RFE ticket? Thoughts?

firstyear commented 5 years ago

I thing this is a great idea, and I've commented on your new ticket. Additionally, as a note, once we fix the filter optimisation again we can talk about the need for idlistscanlimit :) go ahead and close this now I think.

cheimes commented 5 years ago

Yes, please close the ticket. I can't close it myself.

#50133 and @firstyear 's comment https://pagure.io/389-ds-base/issue/50133#comment-548658 cover all my concerns. Thanks!

Metadata Update from @firstyear:
- Issue close_status updated to: invalid
- Issue status updated to: Closed (was: Open)

5 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/3125

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: invalid)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

None

Milestone

None

reviewstatus

None

rhbz

None

origin

None

389-ds-base

Source Code

#50066 Missing index for filter (objectClass=*) Closed: wontfix 5 years ago by firstyear. Opened 5 years ago by cheimes.

Issue Description

Package Version and Platform

Steps to reproduce

Actual results

Expected results

Metadata

#50066 Missing index for filter (objectClass=*)

Closed: wontfix 5 years ago by firstyear. Opened 5 years ago by cheimes.