#50361 When the server detects a possible worker starvation it should log a warning
Closed: wontfix 3 years ago by spichugi. Opened 4 years ago by tbordaz.

Issue Description

If all workers are busy, some operations my spend some time in the workqueue.
At the end, the etime of the operation could be high (several seconds) while the timestamps start/end show the operation was immediate.
It is sometime difficult to detect and server could log a warning a possible starvation and how to fix it.

starvation could be detected:
- number of connection in gettingber > threshold
- op_initiated-op_completed > threshold
- ...

Package Version and Platform

All versions

Steps to reproduce

no easy I guess

Actual results

logs do not contain starvation warning

Expected results

log should contains warning/correction


If this is simply a timing issue, I think we should have a seperate timer for "time on worker" compared to "time from operation submitted to queue to responses to client". Certainly this would be good to have and relates nicely to some of the logging improvements I want to make in the future.

Perhaps an easy way to detect the starvation is if queue len > 2x threads potentially because that shows we can't do work fast enough to process the ops, and then to disable the pressure warning as queue len <= 1x threads.

What are the possible remediations? I think without good detailed logging of what's going wrong inside of operations, it would be hard to indicate proper corrective actions. So I guess I think we should focus on logging and diagnostics first because that is a superset of this problem?

In terms of anything more advanced, we'd probably be talking about a full on scheduler, but that would be hard to build well I think, and I'm not sure we should consider it at this point.

Does that all seem reasonable? Or am I misunderstanding something?

Metadata Update from @firstyear:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

4 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.2

4 years ago

Metadata Update from @mreynolds:
- Issue priority set to: normal
- Issue set to the milestone: 1.4.3 (was: 1.4.2)

4 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/3420

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata