#9363 Fedmsg delivery not reliable using tcp://hub.fedoraproject.org:9940
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by shebert.

Describe what you would like us to do:

Attempts to stream messages from tcp://hub.fedoraproject.org:9940 using fedmsg-tail or fedmsg-trigger is not reliable. Either messages are not received upon connecting or sporadically received.

Datagrepper is used as a comparison in both cases.

See https://apps.fedoraproject.org/datagrepper/raw?rows_per_page=1&delta=127800

To reproduce:

# fedmsg-tail
or
# fedmsg-trigger --command date

Expected outcome:

  • A constant stream of messages as seen in Datagrepper

Notes:


When do you need this to be done by? (YYYY/MM/DD)


As soon as possible.


Metadata Update from @pingou:
- Issue assigned to pingou

3 years ago

So with help from @kevin we did:

ansible proxies -m shell -a 'systemctl status fedmsg-gateway'

Which shows the service running on most proxy, but the last log entry was from Sep 27th for the most recent and Aug 12th for quite a few.

So we've restarted the service:

ansible proxies -m shell -a 'systemctl restart fedmsg-gateway-3'

Let's see if that helps

Unfortunately, it seems to be the same behaviour

but I noticed your restart command included fedmsg-gateway-3 ?

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

3 years ago

Unfortunately, it seems to be the same behaviour

fedmsg-tail is running fine for me here.

Can you run fedmsg-config and check the content of the endpoints dictionary?

but I noticed your restart command included fedmsg-gateway-3 ?

That's because it's the python3 version of the service ;-)

"endpoints": {
"fedora-infrastructure": [
"tcp://hub.fedoraproject.org:9940"
],
"relay_outbound": [
"tcp://127.0.0.1:4001"
]
},

Another reproducer can be:

  • open 3 terminal sessions
  • start a fedmsg-tail in each session but with a 10s delay between.

  • open another terminal session with:

fedora-messaging --conf /etc/fedora-messaging/fedora.toml consume

compare all 3.

ok. Try now? I think it's fixed?

My theory/what I fixed:

busgateway01 had a vpn ip in the old transition ip space that we used when moving from phx2 to iad2. This meant that none of the proxies could talk to it over the vpn. The only ones that worked were the ones in ia2 datacenter (since they could just talk to it directly). I fixed the IP on busgateway and restarted things and confirmed here that fedpmsg-tail seems much better.

Yep! all fixed. All 10 of my fedmsg-tail sessions are pulling in messages. Thanks!

Awesome. Sorry this happened... ;(

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done