So, we have Greenwave deployed into production, however, when I do some testing on it, it fails a lot of requests.
Attached is a script that performs 400 requests to Greenwave for various NVRs.
When I run one instance, about 15% of requests fails.
When I run more instances, more relative % fails, up to the record of 59% failed requests when I ran 10 at the same time.
This really needs to be fixed before we enable gating in Bodhi, or a lot of requests would keep failing constantly.
Also, the response times range from 0.07s up to the worst I've seen, 1m5s for a single response.
I have opened this ticket here rather than on greenwave's project, because it might or might not be greenwave or any of the services it depends on.
<img alt="test.py" src="/fedora-infrastructure/issue/raw/594694acca75e21f4ad831a8c6b967469721296e22fc8a545ea8204cbbd25a5f-test.py" />
@dcallagh @mjia and @ralph are probably interested in this ticket.
I adjusted a little @puiterwijk 's script and this is what I got with it:
Re-running it another time got me:
Failed: 98 # that's because the first request failed, so only 2 other requests failed in the same way
So clearly, fedrepo-req-1.4.0-1.fc26 is the most problematic one for some reasons
Handling this in greenwave#77.
For fedrepo-req-1.4.0-1.fc26, it seems like WaiverDB failed to process a long query like this
and caused Greenwave to return 500. greenwave#83 should be able to fix this problem.
Metadata Update from @ralph:
- Issue tagged with: greenwave
OK, with #6465 more or less resolved, I now get:
Can this be closed now?
Sounds ok to me.
Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)
(FWIW, I think @puiterwijk was going to circle back here and try to hit the app again with 10000000 processes.)
@ralph New results of the worst one in the batch:
So, the max time is decreased a lot, and in none of my tests has any request failed.
The max time is still 13 seconds, but given that that is an outlier, and most runs have a max of 1 second, I'd say that this might've just been some networking fun somewhere.
I'd consider this issue fixed.
to comment on this ticket.