Issue #50674: RFE Extend log of operations statistics in access log - 389-ds-base

389-ds-base

#50674 RFE Extend log of operations statistics in access log

Closed: wontfix 3 years ago by spichugi. Opened 4 years ago by tbordaz.

Issue Description

Operation etime can be a mystery... why is it so long ?
We need to gather internal info but debug log are noisy and disturb the result itself.
This ticket is a proof of concept of kind of raw metrics logged on demand (with appropriate access log level). Being buffered logging has no impact on the operation itself.

Package Version and Platform

all version

Steps to reproduce

The problem raise with substring search with very long initial assertion that is common to a large number of entries. Parallel requests create a bottleneck on dbc_get and etime appears as a mystery.

Actual results

No stat related to an operation

Expected results

[25/Oct/2019:16:23:28.620364576 +0200] conn=1 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(sn=user*)" attrs=ALL
[25/Oct/2019:16:23:28.623678548 +0200] conn=1 op=1 STAT attribute=objectclass key(eq)=referral --> count 0
[25/Oct/2019:16:23:28.623697194 +0200] conn=1 op=1 STAT attribute=sn key(sub)=ser --> count 21
[25/Oct/2019:16:23:28.623698763 +0200] conn=1 op=1 STAT attribute=sn key(sub)=use --> count 24
[25/Oct/2019:16:23:28.623700176 +0200] conn=1 op=1 STAT attribute=sn key(sub)=^us --> count 20
[25/Oct/2019:16:23:28.623701596 +0200] conn=1 op=1 RESULT err=0 tag=101 nentries=20 etime=0.003488420

tbordaz commented 4 years ago

WIP https://pagure.io/389-ds-base/pull-request/50675

Metadata Update from @tbordaz:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

4 years ago

lkrispen commented 4 years ago

you say "Being buffered logging has no impact on the operation itself." but isn't the access to the log buffer using a lock ? for many concurrent substring searches there could be an impact

lkrispen commented 4 years ago

if I understand it correctly you gather stats and then iterate over the stats and log them, but then the logged timestamp does not match the time the stat was collected and there is no indication which of the partial searches took most of the time

tbordaz commented 4 years ago

@lkrispen the statistics are logged after the operation result was sent back but before it is logged in access log. So I believe contention on access log would keep a worker busy longer time but should not affect the operation response time seen by the ldapclient.
Currently statistic logs are quite small but may increase with new monitored areas.
I think statistics logging could be controlled by accesslog-level to be not too much noisy.
also statistics gathering could be controlled by accesslog-level to reduce the performance impact.

I agree that timing is important. timing should be collected during stats gathering so that we can retrieve precisely when and how long specific probe occurred and last.

I will write a small design explaining the mechanism and how it can be tune

log on specific access level
collect on specific access level
collect timestamp+duration on probes

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.2

4 years ago

tbordaz commented 4 years ago

Here is the design http://www.port389.org/docs/389ds/design/log-operation-stats.html

firstyear commented 4 years ago

Commenting on the design:

[25/Oct/2019:16:23:28.623698763 +0200] conn=1 op=1 STAT filter index lookup: attribute=sn key(sub)=use --> count 24

This line could be clearer. What's count? nanoseconds? idl length? Let's describe what the data is that we are seeing better so that anyone can parse this.

You also have:

typedef struct op_stat {
}

You should include two timestamps, taken from CLOCK_MONOTONIC so we can time each "op_stat" that we take to include a duration field on each line, rather than just in the key lookup I think.

Metadata Update from @mreynolds:
- Issue priority set to: normal
- Issue set to the milestone: 1.4.3 (was: 1.4.2)

4 years ago

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1859271

3 years ago

mreynolds commented 3 years ago

Issue linked to Bugzilla: Bug 1859271

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/3729

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

normal

Milestone

1.4.3

reviewstatus

None

rhbz

https://bugzilla.redhat.com/show_bug.cgi?id=1859271

origin

None

389-ds-base

Source Code

#50674 RFE Extend log of operations statistics in access log Closed: wontfix 3 years ago by spichugi. Opened 4 years ago by tbordaz.

Issue Description

Package Version and Platform

Steps to reproduce

Actual results

Expected results

Metadata

#50674 RFE Extend log of operations statistics in access log

Closed: wontfix 3 years ago by spichugi. Opened 4 years ago by tbordaz.