#50280 For SRCH result, log in access log the candidates number
Closed: duplicate 10 months ago by mreynolds. Opened a year ago by tbordaz.

Issue Description

For performance diagnostic it could be helpful to display the number of candidates evaluated.
If the number of candidates is far above the number of returned entries, it indicates that the filter is not selective enough.
With default access log level, this number would not be logged

[19/Feb/2019:10:02:15.068106355 +0100] conn=3686737 op=2 SRCH base="dc=example,dc=com" scope=2 filter="(&(objectClass=sudoRole)(|(sudoHost=ALL)(sudoHost=myhost.example.com)(sudoHost=myhost)(sudoHost=*)(|(sudoHost=*\5C*)(sudoHost=*?*)(sudoHost=*\2A*)(sudoHost=*[*]*))))" attrs="objectClass cn sudoCommand sudoHost sudoUser sudoOption sudoRunAs sudoRunAsUser sudoRunAsGroup sudoNotBefore sudoNotAfter sudoOrder modifyTimestamp"
[19/Feb/2019:10:02:20.894823131 +0100] conn=3686737 op=2 RESULT err=0 tag=101 nentries=98 etime=6.051000 notes=U candidates=12500

Package Version and Platform


Steps to reproduce


I don't think this is necessarily true. This is like saying:

"(&(objectClass=person)(uid=william))" is not selective enough because we have to check 100,000 oc candidates to get to uid= which has 1.

Which isn't true at all. That filter is exactly as selective as it needs to be. IDListScanLimit is a naive attempt to create a heuristic filter optimiser, but actually ends up being more harmful than good because of wasted work.

So I think that this would be better if we could show the number of candidates at each step of the filter application, and that we could enable https://pagure.io/389-ds-base/pull-request/50252 because that's actually what's going to fix this kind of poorly-laid out search term

PS: I wrote the filter optimiser because of that EXACT query that SSSD generates, which uses un-indexed substrings and is generally just awful on servers.

Metadata Update from @firstyear:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

a year ago

@firstyear thank for your comment.

I think filter optimization is a separate improvement. It creates an optimum order/processing of the filter components. It is internal task that requires no admin/support analyze. Also this optimization targets several IDLs manipulation.

This ticket is to give help to admin/support to detect which/why some requests are expensive.
It targets the cost of filter/aci evaluation once one final idl (candidates) have been built. So after filter optimization ticket.
In your example if etime is a concern, if the candidates list is 10K then indexing 'uid' will improve the situation. But if for high etime, the candiate list is low (let's say 10) then indexing 'uid' will not help this specific req and it is more acl that are problematic.

The problem here is just saying "we checked 100,000 candidates", well, that may actually be the correct thing to do! We need to give really meaningful information to admins, not just data that they may mis-interpret (and will misinterpret!). Failing that, we need something that when we look at it, we can see exactly what happened in the event.

I think that this metric is too coarse to derive value in profiling from it, when what we actually need is probably a new access log level, which has start/end timer events around query processing steps. IE:

start op=1 time=0.00000000
    start pre-plugins time=0.0000001
        start plugin X time=.....
        end plugin X time=.....
    end pre-plugins time=.......
    start filter_candidates time=.....

        start intersection time=.....
            start idl (attr=uid) time=.....
            end idl (attr=uid) idl_size=X time=....
            start idl (attr=objectClass) time=.....
            end idl(attr=objectClass) idl_size=X time=....
        end intersection idl_size=X time=.....
        start load entries time=....
        end load entries time=.....
    end filter_candidates time=.....
end op=1 time=0.500000000

^ Imagine if we got a log like that from a customer from a slow query! I think that's a better object to work towards, because it gives us a much much more powerful way to reason about the event that occured in the server.

Going to lose this as duplicate of 50350

Metadata Update from @mreynolds:
- Issue close_status updated to: duplicate
- Issue status updated to: Closed (was: Open)

10 months ago

Login to comment on this ticket.