#2789 is_conn_error fails to match BadStatusLine errors in some cases
Closed: Fixed 3 years ago by tkopecek. Opened 3 years ago by mikem.

This was observed on RHEL7 (py2.7). I'm not sure how widespread it is.

The checks in in_conn_error can fail to match a BadStatusLine error and result in spurious call failures due to keepalive races. E.g.

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1356, in runTask
    response = (handler.run(),)
  File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 328, in run
    return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
  File "/usr/lib/python2.7/site-packages/koji/util.py", line 271, in call_with_argcheck
    return func(*args, **kwargs)
  File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 964, in handler
    semi_results = self.createContainer(**kwargs)
  File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 772, in createContainer
    return self.handle_build_response(build_response, arch=arch)
  File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 504, in handle_build_response
    self._incremental_upload_logs(pid)
  File "/usr/lib/koji-builder-plugins/builder_containerbuild.py", line 352, in _incremental_upload_logs
    incremental_upload(self.session, fname, fd, uploadpath, logger=self.logger)
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 61, in incremental_upload
    fast_incremental_upload(session, fname, fd, path, retries, logger)
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 101, in fast_incremental_upload
    result = session.rawUpload(contents, offset, path, fname, overwrite=True)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2360, in __call__
    return self.__func(self.__name, args, opts)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2771, in _callMethod
    return self._sendCall(handler, headers, request)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2686, in _sendCall
    return self._sendOneCall(handler, headers, request)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 2732, in _sendOneCall
    r = self.rsession.post(handler, **callopts)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 529, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 486, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 598, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', BadStatusLine("''",))

Koji uses is_conn_error to detect cases where the our http connection has expired. In such cases, we make a new connection and retry exactly once. While ideally requests would handle this automatically under the hood, there are cases where it can't or won't, like keepalive races.

In is_conn_error, the code attempts to dissect ConnectionError exceptions from the requests module to see if it is the right sort of error. This task is complicated by the fact that requests embeds an underlying error inside the ConnectionError, and that error comes from requests private bundled copy of the urllib3 library.

It turns out this check is kinda fragile 🤷

Metadata Update from @mikem:
- Custom field Size adjusted to None

3 years ago

I was able to replicate this manually on rhel7 under the debugger. The error we're seeing looks like it should be matched, but it does not.

(Pdb) e2
ProtocolError('Connection aborted.', BadStatusLine("''",))
(Pdb) isinstance(e2, requests.packages.urllib3.exceptions.ProtocolError)
False
(Pdb) isinstance(e2, urllib3.exceptions.ProtocolError)
False
(Pdb) e2.__class__
<class 'urllib3.exceptions.ProtocolError'>

I suspect a simpler heuristic like 'BadStatusLine' in str(e) would be more portable here.

Honestly this is kinda weird:

(Pdb) e2
ProtocolError('Connection aborted.', BadStatusLine("''",))
(Pdb) from urllib3.exceptions import ProtocolError as pe
(Pdb) isinstance(e2, pe)
True
(Pdb) import urllib3
(Pdb) isinstance(e2, urllib3.exceptions.ProtocolError)
False
(Pdb) import inspect
(Pdb) inspect.getfile(e2.__class__)
'/usr/lib/python2.7/site-packages/urllib3/exceptions.pyc'
(Pdb) inspect.getfile(urllib3.exceptions.ProtocolError)
'/usr/lib/python2.7/site-packages/urllib3/exceptions.pyc'

I guess maybe requests or something else is monkey patching urllib3?

Hmm, requests doesn't look to do any magic nor sixwhich I've suspected a bit.

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.25

3 years ago

I've noticed that internal machine we've seen this issue had older requests where 'requests.package.urllib3' didn't existed at all. So, we're evidently failing earlier check on requests.exceptions.ConnectionError' so we don't even get to the point where urllib3 exception is checked. Maybe same problem exist on requests level there.
Anyway, safe way shoud be #2819

Metadata Update from @jcupova:
- Issue tagged with: no_qe

3 years ago

Commit 6e35534 relates to this ticket

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #2824 Merged 2 years ago
  • #2819 Merged 3 years ago