#6363 Greenwave instability
Closed: Fixed 2 years ago Opened 2 years ago by puiterwijk.

So, we have Greenwave deployed into production, however, when I do some testing on it, it fails a lot of requests.

Attached is a script that performs 400 requests to Greenwave for various NVRs.
When I run one instance, about 15% of requests fails.
When I run more instances, more relative % fails, up to the record of 59% failed requests when I ran 10 at the same time.

This really needs to be fixed before we enable gating in Bodhi, or a lot of requests would keep failing constantly.
Also, the response times range from 0.07s up to the worst I've seen, 1m5s for a single response.

I have opened this ticket here rather than on greenwave's project, because it might or might not be greenwave or any of the services it depends on.

test.py


@dcallagh @mjia and @ralph are probably interested in this ticket.

I adjusted a little @puiterwijk 's script and this is what I got with it:

NVR: nonsense-1.0.0-1.fc26
Min: 0:00:00.963963
Max: 0:00:06.069342
Total: 0:01:53.772923
Succes: 100
Failed: 0

NVR: fedrepo-req-1.4.0-1.fc26
Min: 0:00:04.730392
Max: 0:00:18.170778
Total: 0:17:01.100949
Succes: 73
Failed: 27

NVR: rpm-4.13.0.1-7.f26
Min: 0:00:01.080774
Max: 0:00:05.545151
Total: 0:02:21.010905
Succes: 100
Failed: 0

NVR: cpio-2.12-5.fc26
Min: 0:00:00.958010
Max: 0:00:02.153654
Total: 0:01:51.780133
Succes: 100
Failed: 0

********************

Min: 0:00:00.958010
Max: 0:00:18.170778
Total: 0:23:07.664910
Succes: 373
Failed: 27

Re-running it another time got me:

NVR: nonsense-1.0.0-1.fc26
Min: 0:00:00.979270
Max: 0:00:06.211250
Total: 0:01:55.185378
Succes: 100
Failed: 0

NVR: fedrepo-req-1.4.0-1.fc26
Min: 0:00:06.827710
Max: 0:00:17.735626
Total: 0:15:41.357520
Succes: 2
Failed: 98  # that's because the first request failed, so only 2 other requests failed in the same way

NVR: rpm-4.13.0.1-7.f26
Min: 0:00:01.218863
Max: 0:00:06.576177
Total: 0:02:22.237068
Succes: 100
Failed: 0

NVR: cpio-2.12-5.fc26
Min: 0:00:01.142983
Max: 0:00:06.345329
Total: 0:02:21.068078
Succes: 100
Failed: 0

********************

Min: 0:00:00.979270
Max: 0:00:17.735626
Total: 0:22:19.848044
Succes: 302
Failed: 98

So clearly, fedrepo-req-1.4.0-1.fc26 is the most problematic one for some reasons

For fedrepo-req-1.4.0-1.fc26, it seems like WaiverDB failed to process a long query like this

https://paste.fedoraproject.org/paste/DVZAmt7Ib8Ni3VKQcFw69w

and caused Greenwave to return 500. greenwave#83 should be able to fix this problem.

Metadata Update from @ralph:
- Issue tagged with: greenwave

2 years ago

OK, with #6465 more or less resolved, I now get:

Min: 0:00:00.453122
Max: 0:00:09.025539
Total: 0:05:31.733311
Succes: 400
Failed: 0

Can this be closed now?

Sounds ok to me.

:christmas_tree:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

:money_with_wings: Thanks!

(FWIW, I think @puiterwijk was going to circle back here and try to hit the app again with 10000000 processes.)

@ralph New results of the worst one in the batch:

Min: 0:00:00.604409
Max: 0:00:13.211843
Total: 0:05:24.884602
Succes: 400
Failed: 0

So, the max time is decreased a lot, and in none of my tests has any request failed.
The max time is still 13 seconds, but given that that is an outlier, and most runs have a max of 1 second, I'd say that this might've just been some networking fun somewhere.
I'd consider this issue fixed.

Login to comment on this ticket.

Metadata
Attachments 1
Attached 2 years ago View Comment