If all workers are busy, some operations my spend some time in the workqueue. At the end, the etime of the operation could be high (several seconds) while the timestamps start/end show the operation was immediate. It is sometime difficult to detect and server could log a warning a possible starvation and how to fix it.
starvation could be detected: - number of connection in gettingber > threshold - op_initiated-op_completed > threshold - ...
All versions
no easy I guess
logs do not contain starvation warning
log should contains warning/correction
If this is simply a timing issue, I think we should have a seperate timer for "time on worker" compared to "time from operation submitted to queue to responses to client". Certainly this would be good to have and relates nicely to some of the logging improvements I want to make in the future.
Perhaps an easy way to detect the starvation is if queue len > 2x threads potentially because that shows we can't do work fast enough to process the ops, and then to disable the pressure warning as queue len <= 1x threads.
What are the possible remediations? I think without good detailed logging of what's going wrong inside of operations, it would be hard to indicate proper corrective actions. So I guess I think we should focus on logging and diagnostics first because that is a superset of this problem?
In terms of anything more advanced, we'd probably be talking about a full on scheduler, but that would be hard to build well I think, and I'm not sure we should consider it at this point.
Does that all seem reasonable? Or am I misunderstanding something?
Metadata Update from @firstyear: - Custom field origin adjusted to None - Custom field reviewstatus adjusted to None
Metadata Update from @mreynolds: - Issue set to the milestone: 1.4.2
Metadata Update from @mreynolds: - Issue priority set to: normal - Issue set to the milestone: 1.4.3 (was: 1.4.2)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/3420
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.