Issue #51259: performance search rate: contention on global monitoring counters - 389-ds-base

389-ds-base

#51259 performance search rate: contention on global monitoring counters

Closed: wontfix 3 years ago by spichugi. Opened 3 years ago by tbordaz.

Issue Description

Many counters are related to cn=monitoring (snmp, ops, maxthread..). They are global to the server and listener/workers update them frequently. Those (atomic) update look expensive as skipping them improves throughput by >2%.
This ticket is to evaluate the gain of moving some of those global counters to a per workers/listener. Then on cn=monitor search, the per worker/listener counter would be consolidate into a global server counter.

Package Version and Platform

all

Steps to reproduce

to be provided

Actual results

slapi_counter_add is quite high in perf report

Expected results

Throughput increase and slapi_counter_add should be lower in perf

firstyear commented 3 years ago

There is another way to do this. We have a statistic struct "per operation" and at the end of the op we queue that for a thread that does post-processing into the counters. I've suggested this model before I think and it would resolve a lot of the concerns you have here.

The way it works is you pass the stats struct pointer through out the operation which toggles counters in that struct. Since it's a per-operation struct, it's isolated to just that operation and the current working thread (no atomics needed, thread safe etc). Then at the end you queue it (since lock or lock-free append) and a seperate thread would dequeue it and add the values to the main stats store. This would mean there is a slight delay between an operation and the stats appearing, but it would be small, and stats are meant to provide "longer term" views of server behaviour, rather than "exactly at this moment" views. I think that would be a good way to go for this as an improvement over atomics.

Metadata Update from @firstyear:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

3 years ago

firstyear commented 3 years ago

Ohhh and a benefit to my suggestion is we can log the stats in the access log at the end of an operation to get per-operation stats for later diagnostics about a single operation if needed.

tbordaz commented 3 years ago

I like your proposal but I think it is overcomplicated for this specific ticket that is about global servers counters.

Your proposal is somehow related to that ticket/PR (https://pagure.io/389-ds-base/issue/50674) I opened. Having statistics gathered per operations (in an operation object_extension) prevents contention and log in access logs have low impact on performance. It can be useful to diagnose an operation stats.

This ticket is related to general server counters which are only available through 'cn=monitor'. Each worker having its own counters (in a per worker struct) avoid contention. Consolidation of those values is to be made on demand (srch cn=monitor) with atomic read of per threads counters. That is simpler than having a dedicated thread receiving operation struct and continuously consolidate general server counters even if they useless (no srch cn=monitor).

firstyear commented 3 years ago

This ticket is related to general server counters which are only available through 'cn=monitor'. Each worker having its own counters (in a per worker struct) avoid contention. Consolidation of those values is to be made on demand (srch cn=monitor) with atomic read of per threads counters.

Sure, but remember that if you have atomic reads of the counters, the threads must also be performing atomic writes still else ordering and integrity of the values is not guaranteed. So this will reduce contention yes, but it won't reduce the number of atomics so still expect some overhead here. Saying this, you should be able to use "Relaxed" memory ordering in this case to avoid the creation of and acknowledgement of cache invalidation reqs.

If you want to do per-worker-thread counters you may be better off to just have a struct of counters that the worker uses a lock or a copy-on-write type on though.

My concern still would be that for per-worker types that that will create a lot of iteration when cn=monitor is called and a lot of contention during the cn=monitor read phase (but the benefit being reduction of contention during normal operation). IE a read to cn=monitor could cause workers to briefly stall during their processing. But it's probably not a large stall and my concern here is probably not reasonable.

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/4312

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

None

Milestone

None

reviewstatus

None

rhbz

None

origin

None

389-ds-base

Source Code

#51259 performance search rate: contention on global monitoring counters Closed: wontfix 3 years ago by spichugi. Opened 3 years ago by tbordaz.

Issue Description

Package Version and Platform

Steps to reproduce

Actual results

Expected results

Metadata

#51259 performance search rate: contention on global monitoring counters

Closed: wontfix 3 years ago by spichugi. Opened 3 years ago by tbordaz.