#299 Greenwave gave 503 codes today
Closed: Fixed 5 years ago Opened 5 years ago by bowlofeggs.

bodhi-backend02 recorded HTTP 503 codes from Greenwave today:

Message

[2018-09-13 11:58:44][bodhi.server ERROR]

Bodhi failed to send POST request to Greenwave at the following URL "https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/decision". The status code was "503". The error was "".

Process Details

  • host: bodhi-backend02.phx2.fedoraproject.org
  • PID: 34691
  • name: fedmsg-hub
  • command: /usr/bin/python2 /usr/bin/fedmsg-hub
  • msg_id:

Callstack that lead to the logging statement

  File "/usr/lib64/python2.7/threading.py", line 777 in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib64/python2.7/threading.py", line 804 in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 757 in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/twisted/_threads/_threadworker.py", line 46 in work
    task()
  File "/usr/lib64/python2.7/site-packages/twisted/_threads/_team.py", line 190 in doWork
    task()
  File "/usr/lib64/python2.7/site-packages/twisted/python/threadpool.py", line 246 in inContext
    result = inContext.theWork()
  File "/usr/lib64/python2.7/site-packages/twisted/python/threadpool.py", line 262 in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 118 in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib64/python2.7/site-packages/twisted/python/context.py", line 81 in callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/site-packages/moksha/hub/api/consumer.py", line 185 in _work_loop
    self._do_work(message)
  File "/usr/lib/python2.7/site-packages/moksha/hub/api/consumer.py", line 206 in _do_work
    self.consume(message)
  File "/usr/lib/python2.7/site-packages/bodhi/server/consumers/updates.py", line 141 in consume
    update.update_test_gating_status()
  File "/usr/lib/python2.7/site-packages/bodhi/server/models.py", line 1991 in update_test_gating_status
    decision = self.get_test_gating_info()
  File "/usr/lib/python2.7/site-packages/bodhi/server/models.py", line 1987 in get_test_gating_info
    return bodhi.server.util.greenwave_api_post(api_url, data)
  File "/usr/lib/python2.7/site-packages/bodhi/server/util.py", line 1349 in greenwave_api_post
    data=data, retries=3)
  File "/usr/lib/python2.7/site-packages/bodhi/server/util.py", line 1281 in call_api
    return call_api(api_url, service_name, error_key, method, data, headers, retries - 1)
  File "/usr/lib/python2.7/site-packages/bodhi/server/util.py", line 1281 in call_api
    return call_api(api_url, service_name, error_key, method, data, headers, retries - 1)
  File "/usr/lib/python2.7/site-packages/bodhi/server/util.py", line 1281 in call_api
    return call_api(api_url, service_name, error_key, method, data, headers, retries - 1)
  File "/usr/lib/python2.7/site-packages/bodhi/server/util.py", line 1300 in call_api
    log.error(error_msg)

Hey Randy: a couple questions. Want to see if this is transient.

  • Do you have a urllib3 retry handler on it to try more than once?
  • Do you have a guess as to the error rate? (How often does this happen)

I see these in the greenwave logs:

2018-09-13 14:44:01 [pid    11] greenwave.resources WARNING Making a decision about Bodhi update FEDORA-2018-7b4c276394 failed: <Response [503]>
2018-09-13 14:44:04 [pid    10] greenwave.resources WARNING Making a decision about Bodhi update FEDORA-EPEL-2018-e074c21a8f failed: <Response [503]>
2018-09-13 14:44:17 [pid    16] greenwave.resources WARNING Making a decision about Bodhi update FEDORA-2018-1bbe1f76a3 failed: <Response [503]>
2018-09-13 14:45:21 [pid     9] greenwave.resources WARNING Making a decision about Bodhi update FEDORA-2018-84a758e93b failed: <Response [503]>
2018-09-13 14:45:26 [pid    19] greenwave.resources WARNING Making a decision about Bodhi update FEDORA-2018-134e97aa5d failed: <Response [503]>

Bodhi does not use a retry handler. It happens several times a week I'd say (it happened again today around noon:08 EDT).

This happened again a few times on the 15th, and today as well.

Tonight I've observed a high (~50%) failure rate in loading Bodhi update pages with the "failed to talk to Greenwave" message. Loading the same update repeatedly sometimes works and sometimes fails.

@gnaponie, once #295 is pushed out in Fedora, it will be cool to see if it gives us any insight on the issue here.

Thinking. @rbarlow, now that freeze is up we should prepare a new release of greenwave to prod. The eager/lazy results will help the backend (but not the frontend, which still will pass verbose=True, getting all results).

Have you noticed any integration issues with stg that should block a new release?

I haven't, but it is very difficult to use stg for testing since there usually isn't test data there.

Also, I saw a number of timeouts from Greenwave on Bodhi today.

Can confirm it is intermittent today. I see this in the API response from greenwave to Bodhi:

  "message": "Timeout connecting to upstream server: HTTPSConnectionPool(host='waiverdb-web-waiverdb.app.os.fedoraproject.org', port=443): Read timed out. (read timeout=15)"

Well that's weird.

  • I don't see anything but 200s in the waiverdb logs.
  • I don't have data on waiverdb response times, unfortunately. Prometheus metrics would give us that but it's not wired up in Fedora yet.
  • Perhaps this is a routing issue inside os.fp.o? Guessing.

Yeah, something seems to be up. If I run this 10 times, it will just hang intermittently:

$ curl https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/about

I just pinged @puiterwijk in #fedora-noc (he's oncall) to see if he sees anything fishy. :fish:

Of course, #fedora-noc is amazing:

     nirik │ ah, I see.                                      
     nirik │ I see what it is.                               
puiterwijk │ oh?                                             
     nirik │ the new nodes didn't pick up their right vpn ips
puiterwijk │ ah. Well, that'd do it...

OK, @kevin restarted some nodes, and now:

λ ab -n 200 -c 10 https://greenwave-web-greenwave.app.os.fedoraproject.org/api/v1.0/about
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking greenwave-web-greenwave.app.os.fedoraproject.org (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests


Server Software:        gunicorn/19.7.1
Server Hostname:        greenwave-web-greenwave.app.os.fedoraproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128
TLS Server Name:        greenwave-web-greenwave.app.os.fedoraproject.org

Document Path:          /api/v1.0/about
Document Length:        25 bytes

Concurrency Level:      10
Time taken for tests:   5.781 seconds
Complete requests:      200
Failed requests:        0
Total transferred:      142029 bytes
HTML transferred:       5000 bytes
Requests per second:    34.60 [#/sec] (mean)
Time per request:       289.032 [ms] (mean)
Time per request:       28.903 [ms] (mean, across all concurrent requests)
Transfer rate:          23.99 [Kbytes/sec] received

Connection Times (ms)
          min  mean[+/-sd] median   max
Connect:      122  136   8.2    136     170
Processing:    98  131  65.6    103     301
Waiting:       97  130  65.8    102     301
Total:        225  267  67.3    240     448

Percentage of the requests served within a certain time (ms)
  50%    240
  66%    243
  75%    249
  80%    255
  90%    425
  95%    432
  98%    436
  99%    441
 100%    448 (longest request)

Look any better now on your end @rbarlow?

I haven't seen further timeouts, so looks good from here. Thanks!

Metadata Update from @ralph:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata