#32 builds fail due EPIPE error
Closed: Fixed 7 years ago Opened 8 years ago by sharkcz.

For ~1 week I'm seeing builds randomly failing without any obvious reason, the only output I get is

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 1161, in runTask
    response = (handler.run(),)
  File "/usr/lib/python2.7/site-packages/koji/tasks.py", line 158, in run
    return koji.util.call_with_argcheck(self.handler, self.params, self.opts)
  File "/usr/lib/python2.7/site-packages/koji/util.py", line 154, in call_with_argcheck
    return func(*args, **kwargs)
  File "/usr/sbin/kojid", line 1128, in handler
    broot.build(fn,arch)
  File "/usr/sbin/kojid", line 511, in build
    rv = self.mock(args)
  File "/usr/sbin/kojid", line 408, in mock
    incremental_upload(self.session, fname, fd, uploadpath, logger=self.logger)
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 48, in incremental_upload
    fast_incremental_upload(session, fname, fd, path, retries, logger)
  File "/usr/lib/python2.7/site-packages/koji/daemon.py", line 87, in fast_incremental_upload
    result = session.rawUpload(contents, offset, path, fname, overwrite=True)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 1577, in __call__
    return self.__func(self.__name,args,opts)
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 1944, in _callMethod
    self._close_connection()
  File "/usr/lib/python2.7/site-packages/koji/__init__.py", line 1876, in _close_connection
    self._connection[1].close()
  File "/usr/lib64/python2.7/httplib.py", line 844, in close
    sock.close()   # close it manually... there may be other refs
  File "/usr/lib/python2.7/site-packages/koji/ssl/SSLConnection.py", line 82, in close
    self.shutdown()
  File "/usr/lib/python2.7/site-packages/koji/ssl/SSLConnection.py", line 53, in shutdown
    self.__dict__["conn"].shutdown()
  File "/usr/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1522, in shutdown
    self._raise_ssl_error(self._ssl, result)
  File "/usr/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1178, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
SysCallError: (32, 'EPIPE')

after clicking "Show result" it the task web UI

It could be related to some updates on the hub.

koji version on hub: koji-1.10.1-1.4.fed.infra.el7.noarch

koji version on builder: koji-1.10.1-1.fc23.2.fed.infra.noarch


Just as this pyopenssl test shows,

If the underlying socket is closed, : py :obj:`Connection.shutdown`
propagates the write error from the low level write call.

Hence, if we call recv on the socket, we will get a EOF. The urllib3 catches this particular exception in a socket wrapper

def recv(self, *args, **kwargs):
    try:
        data = self.connection.recv(*args, **kwargs)
    except OpenSSL.SSL.SysCallError as e:
        if self.suppress_ragged_eofs and e.args == (-1, 'Unexpected EOF'):
            return b''
        else:
               ... ...

Here is the patch between koji-1.10.0 and koji-1.10.1. Following is the part of the patch about the method _callMehod

     1  @@ -1940,11 +1940,36 @@ class ClientSession(object):
     2                   except (SystemExit, KeyboardInterrupt):
     3                       #(depending on the python version, these may or may not be subclasses of Exception)
     4                       raise
     5  -                except OpenSSL.SSL.Error as e:
     6  -                    # There's no point in retrying this
     7  -                    raise
     8                   except Exception, e:
     9                       self._close_connection()
    10  +                    if isinstance(e, OpenSSL.SSL.Error):
    11  +                        # pyOpenSSL doesn't use different exception
    12  +                        # subclasses, we have to actually parse the args
    13  +                        for arg in e.args:
    14  +                            # First, check to see if 'arg' is iterable because
    15  +                            # it can be anything..
    16  +                            try:
    17  +                                iter(arg)
    18  +                            except TypeError:
    19  +                                continue
    20  +
    21  +                            # We do all this so that we can detect cert expiry
    22  +                            # so we can avoid retrying those over and over.
    23  +                            for items in arg:
    24  +                                try:
    25  +                                    iter(items)
    26  +                                except TypeError:
    27  +                                    continue
    28  +
    29  +                                if len(items) != 3:
    30  +                                    continue
    31  +
    32  +                                _, _, ssl_reason = items
    33  +
    34  +                                if ('certificate revoked' in ssl_reason or
    35  +                                        'certificate expired' in ssl_reason):
    36  +                                    # There's no point in retrying for this
    37  +                                    raise
    38                       if not self.logged_in:
    39                           #in the past, non-logged-in sessions did not retry. For compatibility purposes
    40                           #this behavior is governed by the anon_retry opt.

In line 9 we try to close the connection, but in koji-1.10.0 we try to catch all the _OpenSSL.SSL.Error before to close the connection, hence no such issue in in koji-1.10.0.

But in koji-1.10.1 we move and fix the code to catch OpenSSL.SSL.Error after closing the connection, hence the issue appears.

Here I think we have 2 ways to resolve this issue

  1. catch EPIPE SyscallError and pass
  2. catch EPIPE SyscallError and verify what we need have done.

I prefer the first one.

@xning yes, probably solution 1. I think most of the work here will be validation. We need to be sure this isn't going to break anything (on a number of different targets).

Here is the patch that we can test it.

I've hit some more errors with closing already closed socket. See this branch
Testing issue I've hit:

  • Use F24/koji-1.10.1.12.fc24 with invalid certificate.
  • $ koji moshimoshi

Error: [('SSL routines', 'SSL_shutdown', 'shutdown while in init')]

Validity of certificate is checked and socket closed. After that is re-closed by exception handler and non-related (to invalid certificate) exception is raised. When patch from my branch is applied, you should get more correct:

OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert certificate expired'), ('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]

I think that the use-requests-2 branch will do a better job here. If anyone can reproduce this reliably, can they try out this code and see if it solve the issue?

https://github.com/mikem23/koji-playground/tree/use-requests-2

I can get a number of EPIPE errors in kojid locally, but none of them are fatal. Those disappear if I use the above branch.

Confirmed - all my EPIPE errors disappear when using this branch.

I'll file a PR for the use-requests-2 branch later today. I think that is the best way to solve this issue

Closing since the requests change has been merged

@mikem changed the status to Closed

7 years ago

Login to comment on this ticket.

Metadata