#7798 mdapi connections error
Closed: Upstream 4 months ago by cverna. Opened 7 months ago by cverna.

When testing the newly deployed mdapi on OpenShift (mdapi.fp.o) with fedora-packaging indexing I get a lot of connection errors.

On the client side (fedora-packages making requests to mdapi) :

Exception in thread Thread-26:                                                                                                                                                                                     
Traceback (most recent call last):                                                                                                                                                                                 
  File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner                                                                                                                                         
    self.run()                                                                                                                                                                                                     
  File "/usr/lib64/python2.7/threading.py", line 757, in run                                                                                                                                                       
    self.__target(*self.__args, **self.__kwargs)                                                                                                                                                                   
  File "/usr/lib/python2.7/site-packages/fedoracommunity/pool.py", line 33, in run                                                                                                                                 
    result = func(item)                                                                                                                                                                                            
  File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 365, in io_work                                                                                                                    
    package = self.construct_package_dictionary(package)                                                                                                                                                           
  File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 288, in construct_package_dictionary                                                                                               
    package['sub_pkgs'] = list(self.get_sub_packages(package))                                                                                                                                                     
  File "/usr/lib/python2.7/site-packages/fedoracommunity/search/index.py", line 306, in get_sub_packages                                                                                                           
    response = local.http.get(url)                                                                                                                                                                                 
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 537, in get                                                                                                                                   
    return self.request('GET', url, **kwargs)                                                                                                                                                                      
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 524, in request                                                                                                                               
    resp = self.send(prep, **send_kwargs)                                                                                                                                                                          
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 637, in send                                                                                                                                  
    r = adapter.send(request, **kwargs)                                                                                                                                                                            
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 498, in send                                                                                                                                  
    raise ConnectionError(err, request=request)                                                                                                                                                                    
ConnectionError: ('Connection aborted.', BadStatusLine("''",))  

On the server side (mdapi in OpenShift)

2019-05-15 11:31:22,850 [ERROR] aiohttp.server: Unhandled exception
Traceback (most recent call last):
  File "/usr/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 411, in start
    await resp.write_eof()
  File "/usr/lib64/python3.7/site-packages/aiohttp/web_response.py", line 596, in write_eof
    await super().write_eof(body)
  File "/usr/lib64/python3.7/site-packages/aiohttp/web_response.py", line 401, in write_eof
    await self._payload_writer.write_eof(data)
  File "/usr/lib64/python3.7/site-packages/aiohttp/http_writer.py", line 136, in write_eof
    self._write(chunk)
  File "/usr/lib64/python3.7/site-packages/aiohttp/http_writer.py", line 67, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport

I am wondering if the connection are not getting killed by the proxies, but I don't know enough about how the proxies works to be sure.


What host(s) are the connections using, resolving to what?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

7 months ago

So the connections are against mdapi.fedoraproject.org and it resolves to either proxy110.phx2.fedoraproject.org (10.5.126.9) or proxy101.phx2.fedoraproject.org (10.5.126.8).

@cverna out of curiosity, is this still happening?

I have not tried again since last time. I wanted to try it locally to see if I could reproduce the error.

So I spent a little time looking at this today and I found out that the responses are much slower on OpenShift.

One difference is the version of Python (3.7 in OpenShift and 3.6 in the VM) . I am not sure if there were major changes around asyncio in between these versions but that could be an explanation.

See the performance tests results below

OpenShift

ab -c 100 -n 100 https://mdapi.fedoraproject.org/rawhide/pkg/guake
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mdapi.fedoraproject.org (be patient).....done


Server Software:        Python/3.7
Server Hostname:        mdapi.fedoraproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128
Server Temp Key:        X25519 253 bits
TLS Server Name:        mdapi.fedoraproject.org

Document Path:          /rawhide/pkg/guake
Document Length:        2676 bytes

Concurrency Level:      100
Time taken for tests:   34.945 seconds
Complete requests:      100
Failed requests:        74
   (Connect: 0, Receive: 0, Length: 74, Exceptions: 0)
Non-2xx responses:      74
Total transferred:      127137 bytes
HTML transferred:       76384 bytes
Requests per second:    2.86 [#/sec] (mean)
Time per request:       34944.770 [ms] (mean)
Time per request:       349.448 [ms] (mean, across all concurrent requests)
Transfer rate:          3.55 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      386  478  50.6    485     558
Processing:  1453 26441 8068.3  30401   30716
Waiting:     1453 26441 8068.7  30401   30716
Total:       1926 26919 8090.6  30881   31242

Percentage of the requests served within a certain time (ms)
  50%  30881
  66%  30922
  75%  30936
  80%  30939
  90%  30956
  95%  30960
  98%  31184
  99%  31242
 100%  31242 (longest request)

VM

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking apps.fedoraproject.org (be patient).....done


Server Software:        Python/3.6
Server Hostname:        apps.fedoraproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,4096,128
Server Temp Key:        X25519 253 bits
TLS Server Name:        apps.fedoraproject.org

Document Path:          //mdapi/rawhide/pkg/guake
Document Length:        2676 bytes

Concurrency Level:      100
Time taken for tests:   6.130 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      317781 bytes
HTML transferred:       267600 bytes
Requests per second:    16.31 [#/sec] (mean)
Time per request:       6129.748 [ms] (mean)
Time per request:       61.297 [ms] (mean, across all concurrent requests)
Transfer rate:          50.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      381  560  82.1    593     646
Processing:   343 2510 1275.0   2480    4866
Waiting:      326 2493 1268.8   2480    4866
Total:        726 3070 1323.7   3102    5417

Percentage of the requests served within a certain time (ms)
  50%   3102
  66%   3682
  75%   4214
  80%   4530
  90%   5013
  95%   5183
  98%   5392
  99%   5417
 100%   5417 (longest request)

Ok so I got to the bottom of it this commit is breaking the performance (https://pagure.io/mdapi/c/2e5b04dc45138192b34f6ea39cbed0023d716f9a?branch=master).

I guess the type of lock file is now blocking which makes the performance worst. I ll close this an open a ticket upstream.

Metadata Update from @cverna:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

4 months ago

Login to comment on this ticket.

Metadata