When testing the newly deployed mdapi on OpenShift (mdapi.fp.o) with fedora-packaging indexing I get a lot of connection errors.
On the client side (fedora-packages making requests to mdapi) :
Exception in thread Thread-26: Traceback (most recent call last): File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner self.run() File "/usr/lib64/python2.7/threading.py", line 757, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/site-packages/fedoracommunity/pool.py", line 33, in run result = func(item) File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 365, in io_work package = self.construct_package_dictionary(package) File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 288, in construct_package_dictionary package['sub_pkgs'] = list(self.get_sub_packages(package)) File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 306, in get_sub_packages response = local.http.get(url) File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 537, in get return self.request('GET', url, **kwargs) File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 524, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 637, in send r = adapter.send(request, **kwargs) File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', BadStatusLine("''",))
On the server side (mdapi in OpenShift)
2019-05-15 11:31:22,850 [ERROR] aiohttp.server: Unhandled exception Traceback (most recent call last): File "/usr/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 411, in start await resp.write_eof() File "/usr/lib64/python3.7/site-packages/aiohttp/web_response.py", line 596, in write_eof await super().write_eof(body) File "/usr/lib64/python3.7/site-packages/aiohttp/web_response.py", line 401, in write_eof await self._payload_writer.write_eof(data) File "/usr/lib64/python3.7/site-packages/aiohttp/http_writer.py", line 136, in write_eof self._write(chunk) File "/usr/lib64/python3.7/site-packages/aiohttp/http_writer.py", line 67, in _write raise ConnectionResetError('Cannot write to closing transport') ConnectionResetError: Cannot write to closing transport
I am wondering if the connection are not getting killed by the proxies, but I don't know enough about how the proxies works to be sure.
What host(s) are the connections using, resolving to what?
Metadata Update from @kevin: - Issue priority set to: Waiting on Assignee (was: Needs Review)
So the connections are against mdapi.fedoraproject.org and it resolves to either proxy110.phx2.fedoraproject.org (10.5.126.9) or proxy101.phx2.fedoraproject.org (10.5.126.8).
@cverna out of curiosity, is this still happening?
I have not tried again since last time. I wanted to try it locally to see if I could reproduce the error.
So I spent a little time looking at this today and I found out that the responses are much slower on OpenShift.
One difference is the version of Python (3.7 in OpenShift and 3.6 in the VM) . I am not sure if there were major changes around asyncio in between these versions but that could be an explanation.
See the performance tests results below
ab -c 100 -n 100 https://mdapi.fedoraproject.org/rawhide/pkg/guake This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking mdapi.fedoraproject.org (be patient).....done Server Software: Python/3.7 Server Hostname: mdapi.fedoraproject.org Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128 Server Temp Key: X25519 253 bits TLS Server Name: mdapi.fedoraproject.org Document Path: /rawhide/pkg/guake Document Length: 2676 bytes Concurrency Level: 100 Time taken for tests: 34.945 seconds Complete requests: 100 Failed requests: 74 (Connect: 0, Receive: 0, Length: 74, Exceptions: 0) Non-2xx responses: 74 Total transferred: 127137 bytes HTML transferred: 76384 bytes Requests per second: 2.86 [#/sec] (mean) Time per request: 34944.770 [ms] (mean) Time per request: 349.448 [ms] (mean, across all concurrent requests) Transfer rate: 3.55 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 386 478 50.6 485 558 Processing: 1453 26441 8068.3 30401 30716 Waiting: 1453 26441 8068.7 30401 30716 Total: 1926 26919 8090.6 30881 31242 Percentage of the requests served within a certain time (ms) 50% 30881 66% 30922 75% 30936 80% 30939 90% 30956 95% 30960 98% 31184 99% 31242 100% 31242 (longest request)
This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking apps.fedoraproject.org (be patient).....done Server Software: Python/3.6 Server Hostname: apps.fedoraproject.org Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128 Server Temp Key: X25519 253 bits TLS Server Name: apps.fedoraproject.org Document Path: //mdapi/rawhide/pkg/guake Document Length: 2676 bytes Concurrency Level: 100 Time taken for tests: 6.130 seconds Complete requests: 100 Failed requests: 0 Total transferred: 317781 bytes HTML transferred: 267600 bytes Requests per second: 16.31 [#/sec] (mean) Time per request: 6129.748 [ms] (mean) Time per request: 61.297 [ms] (mean, across all concurrent requests) Transfer rate: 50.63 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 381 560 82.1 593 646 Processing: 343 2510 1275.0 2480 4866 Waiting: 326 2493 1268.8 2480 4866 Total: 726 3070 1323.7 3102 5417 Percentage of the requests served within a certain time (ms) 50% 3102 66% 3682 75% 4214 80% 4530 90% 5013 95% 5183 98% 5392 99% 5417 100% 5417 (longest request)
Ok so I got to the bottom of it this commit is breaking the performance (https://pagure.io/mdapi/c/2e5b04dc45138192b34f6ea39cbed0023d716f9a?branch=master).
I guess the type of lock file is now blocking which makes the performance worst. I ll close this an open a ticket upstream.
Metadata Update from @cverna: - Issue close_status updated to: Upstream - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.