#12832 Nagios server `notify-by-fedora-messaging`command might not work
Closed: Fixed a month ago by amedvede. Opened 2 months ago by amedvede.

During releng repo refactoring I noticed that nagios server running a command fedora-messaging-logger https://pagure.io/fedora-infra/ansible/blob/main/f/roles/nagios_server/files/nagios/commands/notify.cfg#_70. But I didn't find where it comes from, my and Kevin's opinion that its not working now, and was previously delivered as rpm.


What is it supposed to do? I'm just thinking about what requirements need to be ported to Zabbix, eventually :)

It was before a call to emit a message on the fedmsg bus... and it either broke when we retired that in favor of fedora-messaging or there was some fedora-messaging config that got dropped.

But basically it's to emit a message on our message bus about the alert.

However, given the way fmn works, I am not sure how useful this is now for notifications.
But perhaps it's good for stats/data ?

Metadata Update from @james:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble

2 months ago

If it works now, I can add a comment stating that if it fails at some point, it should be removed; just a suggestion.

It's not working I am pretty sure. It's just siliently not sending any messages. ;(

Is it okay if I remove it then?

I'd rather we fix it. ;) Is there a copy of this script we can have it use somewhere?

I'm not sure, this command was added 5 years ago in this commit: https://pagure.io/fedora-infra/ansible/c/41fe0ec74efa59c44416ed00e984bde2d9b64672
Now in release this script is available in different location, but by the arguments it takes, its exactly the script that used in nagios: https://pagure.io/releng/blob/0411361a65a93efc089c6ac52a0230e25bde3b39/f/scripts_new/infrastructure/messaging/fedora_messaging_logger.py

I can add this script to usr/bin on noc server and change the name back to what it was, and it will work again. But I don't have access to the noc server so let me know what you think

Well, you should do that in an ansible PR. ;) Then you don't need any access... :)

Metadata Update from @amedvede:
- Issue assigned to amedvede

2 months ago

There are a lot of format warnings in Zuul job not connected to my PR, Should I fix them? Also it's not clear to me what is the next step is, yesterday Greg said that it might to be a good idea to add Fedora messages notifications to Zabbix. I'll discuss it with him. But back to Nagios, cloning the whole releng repo during deploy is too much work to get just 1 script. So, solution can be either I'll try to fetch only one script not the whole repo, or I'll move the script to server where we run playbooks, and it will copy the script from there. Don't know what path is best.

I should have replied to this earlier, sorry...

However, given the way fmn works, I am not sure how useful this is now for notifications.

Probably not. We already have Matrix notifications from both Nagios and Zabbix - and Zabbix can support arbitrary other targets for notifications via webhooks (indeed, the Zabbix Matrix notifications are via a webhook). So, if we want to send things to FMN, lets do it in Zabbix as it has the native types we need?

But perhaps it's good for stats/data ?

What stats would that be? We now get all the metrics data in Zabbix and also reporting on things like "most frequent triggers" etc - is FMN doing something on top? My gut feel says we have the data we need, but perhaps I'm wrong?

There are a lot of format warnings in Zuul job not connected to my PR, Should I fix them?

We know it fails, what matters is that we don't make it worse. So long as the errors aren't for your changes it's fine.

it's not clear to me what is the next step is

My feeling would be to explore Zabbix and see if this is needed at all? Again, sorry I dosdn't say this before you did the PR...

So, perhaps I was hasty saying we should fix this.

On the one hand, having alerts in datanommer/message bus allows for folks to pull that and do some data mining on it.
On the other hand, zabbix should have all that (provided we don't drop the database or something).
There's no way with the new fmn to get notifications based on nagios fedora-messages (if they existed).

I guess I am not sure what/who would want use the data in the end...

So, I guess perhaps we could just drop it from nagios then, and consider if we want to add fedora-messaging support for zabbix or just not bother and ask people to use it's native metrics.

We probably can drop it from Nagios, indeed, since we are slowly moving towards Zabbix. On the other hand it would be good to have it in Zabbix because it will allow us to build event-based automation. But it depends on how many messages there will be in the final, so we won't plug data bus by adding much more messages

We probably can drop it from Nagios, indeed, since we are slowly moving towards Zabbix. On the other hand it would be good to have it in Zabbix because it will allow us to build event-based automation. But it depends on how many messages there will be in the final, so we won't plug data bus by adding much more messages

Yeah, that was my thought at first, but thinking about it more... do we have any automation around this? I don't think so... so, perhaps until we have a need for it we just don't do it?

After discussion with Greg, we decided that we don't need fedora-messaging notifications, and if we need them, it will be easy to add them to zabbix. So I commented lines for nagios with a command for notification through fedora-messaging. PR: https://pagure.io/fedora-infra/ansible/pull-request/2931

Metadata Update from @amedvede:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a month ago

Log in to comment on this ticket.

Metadata