When we retired fedmsg bus this morning we also retired roles/nagios_server/files/nagios/services/iad2_internal/fedmsg.cfg which was some 'fedmsg' checks that ran on busgateway01.
However, many of these checks were not really fedmsg checks, but rather fedora-messaging checks anymore. ;)
They were things that checked datagrepper for the last message in a topic and if it was longer ago that a value passed, it would alert. So, for example this would let us know there was no bodhi composes in a day (so updates were stuck). Or bugzilla2fedmsg wasn't processing, etc.
So, we need to readd these checks. I think it's as easy as getting the check script working on noc01 and then just readding those to run against noc01 instead of busgateway01.
Bonus would be adding them to work with zabbix as well.
Metadata Update from @zlopez: - Issue assigned to zlopez - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: high-gain, low-trouble, ops
I will look at it tomorrow.
Here is a PR that should fix that.
This wouldn't be that easy as I thought. The check datanommer scripts call datanommer-latest command which is provided by datanommer-commands package, which was retired 2 years ago. I need to figure out what the script does and copy it over as a file in ansible that should fix it.
datanommer-latest
Looking at the datanommer-latest on busgateway01 it will not work without fedmsg. So if we want the checks in place, we need to find different way. But that wouldn't be that easy.
I will keep the busgateway01 machine for now till we decide what to do about this.
hum. Yeah, perhaps thats why we didn't move them before. ;(
It should just be a curl against datanommer ? but I guess then you get into date parsing.
We can likely kill busgateway anyhow, as the checks aren't being run on it anyhow. Unless you need to keep it to look at how it was working.
After checking the upstream I see that @abompard already migrated that to fedora messaging. It should be possible to just use that and work with it. I will decommission the machines now as I finally found out where the code lives. And I will start working on adding the checks to noc01 as the code changes are already done.
I see that both datanomer.models and datanommer.commands are packaged for PyPi. Do we want to package them in Fedora again (I can take that over) or just install the PyPi packages directly as they are maintained by us?
And the busgateway machines are now gone.
For vm's at least I strongly prefer rpms. If we pip install things or use other package management systems, we have a lot less visibility into what is where and what version it is We could easily have a insecure/old/outdated version of something and be unaware.
If they are small enough I suppose we could carry them in ansible and install that way. At least greps/etc would find them there...
If the packages are already available on pypi I will just unretire them and package them again in Fedora. I will add infra-sig as co-maintainer, so anybody from the team can touch them.
Currently waiting for reviews of the packages and here is the releng ticket for unretirement https://pagure.io/releng/issue/12597
Log in to comment on this ticket.