#50361 When the server detects a possible worker starvation it should log a warning
Opened 9 months ago by tbordaz. Modified 9 months ago

Issue Description

If all workers are busy, some operations my spend some time in the workqueue.
At the end, the etime of the operation could be high (several seconds) while the timestamps start/end show the operation was immediate.
It is sometime difficult to detect and server could log a warning a possible starvation and how to fix it.

starvation could be detected:
- number of connection in gettingber > threshold
- op_initiated-op_completed > threshold
- ...

Package Version and Platform

All versions

Steps to reproduce

no easy I guess

Actual results

logs do not contain starvation warning

Expected results

log should contains warning/correction

If this is simply a timing issue, I think we should have a seperate timer for "time on worker" compared to "time from operation submitted to queue to responses to client". Certainly this would be good to have and relates nicely to some of the logging improvements I want to make in the future.

Perhaps an easy way to detect the starvation is if queue len > 2x threads potentially because that shows we can't do work fast enough to process the ops, and then to disable the pressure warning as queue len <= 1x threads.

What are the possible remediations? I think without good detailed logging of what's going wrong inside of operations, it would be hard to indicate proper corrective actions. So I guess I think we should focus on logging and diagnostics first because that is a superset of this problem?

In terms of anything more advanced, we'd probably be talking about a full on scheduler, but that would be hard to build well I think, and I'm not sure we should consider it at this point.

Does that all seem reasonable? Or am I misunderstanding something?

Metadata Update from @firstyear:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None

9 months ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.4.2

9 months ago

Login to comment on this ticket.