#11642 Intermittent Skipping of Messages in Fedora Messaging Bus Setup
Closed: Fixed 2 years ago by zlopez. Opened 2 years ago by efedin.

Hello,

We've been encountering a rather unusual issue with our Fedora messaging bus setup, where messages are being skipped intermittently. This started last night and has been quite erratic in its occurrence.

In our setup, we're using github2fedmsg for posting messages to fedora messaging bus. These are then picked up by Jenkins, which runs on the Red Hat intranet and uses a JMS messaging plugin to listen for specific topics and dispatch jobs.

The core of the issue is that, starting on Tuesday, a large portion of messages - I'd estimate around 90% - began being skipped. This wasn't constant; it lasted for about 5 hours, seemed to resolve itself, but then started again yesterday.

What's puzzling is that we can see all the messages are present in the bus via datagrepper. Yet, on our end, everything seems in order - no strange logs, no updates to either the plugin or Jenkins that could explain this.

We would really appreciate it if you could take a look at the fedora messaging side of things to see if everything is functioning as it should. Your insight into this would be incredibly helpful.

Thanks a lot for your help!


@abompard Could you look at this? I'm not sure how the messages could be lost.

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: Needs investigation, medium-gain

2 years ago

Hey! I can have a look, but first I have a few questions to narrow down the issue.
- could you send me a link to the datagrepper results containing the messages you're seeing?
- could you send me the rabbitmq username that your JMS messaging plugin is connecting to the bus with

Thanks!

Metadata Update from @abompard:
- Issue assigned to abompard

2 years ago

Sure!

We are listening to org.fedoraproject.prod.github.issue.comment topic, so e.g. https://apps.fedoraproject.org/datagrepper/v2/search?rows_per_page=1&delta=100000&topic=org.fedoraproject.prod.github.issue.comment

As for rabbitmq username, I didn't find anything, here is the complete configuration of the plugin. Btw, plugin is https://plugins.jenkins.io/jms-messaging/

- fedMsg:
    hubAddr: "tcp://hub.fedoraproject.org:9940"
    name: "fedmsg"
    pubAddr: "tcp://172.19.4.24:9941"
    topic: "org.fedoraproject.prod.github.issue.comment"

Oh, yeah that's not fedora-messaging, that's the old fedmsg (its predecessor). It did not have any protection against lost messages, so that might be why you're not getting them.
Try switching to the "rabbitmq" mode, with the following config:

hostname: "rabbitmq.fedoraproject.org"
virtualHost: "/public_pubsub"
topic: "org.fedoraproject.prod.github.issue.comment"

For authentication you can use the public SSL cert/key available here: https://github.com/fedora-infra/fedora-messaging/tree/develop/configs
I'm not certain how authentication configuration is done in the JMS plugin, but people more familiar with Java may know how to translate cert & key to a keystore and & truststore.

This will give you read-only access to the bus.

Oh, I see, thank you! For the plugin setup, I also need a port number, exchange, queue, and username for SSL certificate authentication. Could you help me with that?

Port will be the default for AMQPs: 5671.
Exchange shouldn't be necessary if you don't send messages, but if the config requires it it's amq.topic.
The username is fedora.
The queue can be left blank and that should mean asking the server to give you an autogenerated one. If the JMS plugin/library does not support this, you can choose a UUID4-like name, such as 00000000-0000-0000-0000-000000000000 where all the zeros are hex chars ([0-9a-f])

Hey @efedin , any progress on that?

Hi @abompard, I'm sorry for getting quiet. Migration to rabbitmq completely solved the issue. Thank you for being so helpful! Feel free to close the ticket.

Metadata Update from @zlopez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata