#676 [frontend] pagure-events: send keep-alive tcp packets
Merged 4 years ago by praiskup. Opened 4 years ago by praiskup.
Unknown source pagure-events-keepalive  into  master

@@ -222,6 +222,14 @@

  

      ctx = zmq.Context()

      s = ctx.socket(zmq.SUB)

+ 

+     # detect server hang/restart (still a chance to loose ~45s events)

+     # for more info see man tcp(7).

+     s.setsockopt(zmq.TCP_KEEPALIVE,       1)  # turn on keep-alive

+     s.setsockopt(zmq.TCP_KEEPALIVE_IDLE, 30)  # start when 30s inactive

+     s.setsockopt(zmq.TCP_KEEPALIVE_INTVL, 5)  # send keep-alive packet each 5s

+     s.setsockopt(zmq.TCP_KEEPALIVE_CNT,   3)  # restart after 3 fails

+ 

      s.connect(ENDPOINT)

  

      for topic in TOPICS:

After e.g. Pagure restart (or dunno what were the conditions), we
started to see only:

...
[2019-04-19 13:50:06,267][ DEBUG]: Polling...
[2019-04-19 13:50:16,279][ DEBUG]: Polling...
...

in our log, even though pagure was sending the messages. So
setup some defaults for tcp keep-live packets (we'll move to
fedora-messaging (amqp) from fedmsg (zmq) anyways, so this is
rather temporary fix).

Metadata Update from @praiskup:
- Pull-request tagged with: needs-work

4 years ago

Ok, I can confirm this helps to some extent:
/usr/share/copr/coprs_frontend/run/pagure-events2.py is still accepting actions on staging, but
/usr/share/copr/coprs_frontend/run/pagure-events.py anymore (it contains this patch).

Metadata Update from @praiskup:
- Pull-request untagged with: needs-work

4 years ago

There is a typo in the comment, should be "keep-alive"

According to http://api.zeromq.org/3-2:zmq-setsockopt

ZMQ_TCP_KEEPALIVE_IDLE: Override TCP_KEEPCNT(or TCP_KEEPALIVE on some OS)
ZMQ_TCP_KEEPALIVE_CNT: Override TCP_KEEPCNT socket option

Doesn't they do the same thing then?

Can we have a comment here, that it is possible to find information about these constants in man tcp? It is not that obvious because on http://api.zeromq.org/3-2:zmq-setsockopt there is just a "Override TCP_FOO socket option" and that's it.

According to http://api.zeromq.org/3-2:zmq-setsockopt

ZMQ_TCP_KEEPALIVE_IDLE: Override TCP_KEEPCNT(or TCP_KEEPALIVE on some OS)
ZMQ_TCP_KEEPALIVE_CNT: Override TCP_KEEPCNT socket option

Doesn't they do the same thing then?

I hope that is a bug in documentation, and it sets the TCP_KEEPIDLE actually:

import zmq
..
In [3]: zmq.TCP_KEEPALIVE_IDLE
Out[3]: 36

In [4]: zmq.TCP_KEEPALIVE_CNT
Out[4]: 35

From src/tcp.cpp:

148 #ifdef ZMQ_HAVE_TCP_KEEPIDLE
149         if (keepalive_idle_ != -1) {
150             int rc = setsockopt (s_, IPPROTO_TCP, TCP_KEEPIDLE,
151                                  &keepalive_idle_, sizeof (int));
152             tcp_assert_tuning_error (s_, rc);
153             if (rc != 0)
154                 return rc;
155         }
156 #else // ZMQ_HAVE_TCP_KEEPIDLE

rebased onto fe0d680f8c26ed6c14f66282261298f0c5d0941c

4 years ago

Commented the man tcp, and fixed the kee-alive typo.

rebased onto d2c3ee89c778860de852326d38bc7fb7c599c969

4 years ago

rebased onto 3216acc

4 years ago

Pull-Request has been merged by praiskup

4 years ago
Metadata