#3360 proton: save messages when connection fails
Merged 2 years ago by tkopecek. Opened 2 years ago by tkopecek.
tkopecek/koji issue3327  into  master

file modified
+7 -2
@@ -323,8 +323,13 @@ 

  def _send_msgs(urls, msgs, CONFIG):

      random.shuffle(urls)

      for url in urls:

-         container = Container(TimeoutHandler(url, msgs, CONFIG))

-         container.run()

+         try:

+             container = Container(TimeoutHandler(url, msgs, CONFIG))

+             container.run()

+         except Exception as ex:

+             # It's ok if we don't send messages for any reason. We'll try again later.

+             LOG.debug(f'container setup error ({url}): {ex}')

+ 

          if msgs:

              LOG.debug('could not send to %s, %s messages remaining',

                        url, len(msgs))

Fixes: https://pagure.io/koji/issue/3327

It looks to me, that on_settled should work correctly also in this case. So 'context.msgs' should be in valid state.

It's dangerous to have bare exceptions like this, especially silently passing them like this. I understand your comment about on_settled, but there is a lot of code that gets called inside these two lines. Also, imagine if anyone ever changes TimeoutHandler in the future to raise more unhandled exceptions. Or imagine we find there are some other errors that we really wish we had known about, etc. For example, https://issues.apache.org/jira/browse/PROTON-2465

Ticket #3327 shows a socket.getaddrinfo() failure, presumably a temporary DNS failure. We could narrow this except statement to only catch socket.error. And I think we should log when we hit this, so we know how bad the problem is, rather than silently passing.

Here is what I recommend instead:

try:
    container = Container(TimeoutHandler(url, msgs, CONFIG))
    container.run()
except socket.error as e:
    LOG.debug('container setup error (%s): %s', url, e)

_send_msgs should always pass. It returns list of un/sent messages so those which were succesfully sent can be deleted from the db queue, so we don't send them twice. We should log every error but I'm not sure if any of these should be terminating.

Ok. I think we should comment why we're doing this. Like this:

try:
    container = Container(TimeoutHandler(url, msgs, CONFIG))
    container.run()
except Exception as e:
    # It's ok if we don't send messages for any reason. We'll try again later.
    LOG.debug('container setup error (%s): %s', url, e)

rebased onto fd2b76e

2 years ago

It looks to me, that on_settled should work correctly also in this case. So 'context.msgs' should be in valid state.

Ok, I see that the docs for on_settled says, "This is the point at which it should never be retransmitted," [1] so hopefully unlikely that we'd have an error causing duplicate transmission. I guess the biggest risk there is a bug in our own on_settled handler, but it seems pretty simple and solid.

[1] https://qpid.apache.org/releases/qpid-proton-0.32.0/proton/python/docs/overview.html

Metadata Update from @tkopecek:
- Pull-request tagged with: testing-ready

2 years ago

pretty please pagure-ci rebuild

2 years ago

Metadata Update from @mfilip:
- Pull-request tagged with: testing-done

2 years ago

Commit b94838b fixes this pull-request

Pull-Request has been merged by tkopecek

2 years ago