#3736 add some nagios monitoring for builders
Closed: Fixed None Opened 10 years ago by kevin.

I'd like to add some nagios monitoring for builders.

  • The primary builders are virtual instances running on buildvmhosts. You can see the full list by looking on lockbox01 in /var/log/virthost-lists.out for 'build'.

  • The arm builders are armXX-builderYY.arm.fedoraproject.org where XX is 01 or 02, and YY is 00-23.

Alerts for these hosts should be setup to send email notifications only. They should not send epager notices (if one of them fails the only thing impacted is the specific jobs they are working on, thats it). This could be done via a seperate contactgroup setup perhaps.

puppet/modules/nagios/files/nagios/ would be the place to look at the existing monitoring.
Note that in the case of the buildvm's they should have dependencies on their buildvmhost.


Here is a sample config for a host. Let me know if it is fine:[[BR]]

{{{
define host {
host_name buildvm-12.phx2.fedoraproject.org
alias buildvm-12
use Builder VM 12
check_command check-host-alive
address buildvm-12.phx2.fedoraproject.org
parents buildvmhost-04
contact_groups build-sysadmin-email
}
}}}

Yep. That looks pretty good to me for a host...

builder machines' host def for nagios
builders_nagios.patch

Attached a git diff patch which contains all the host definitions and contact_group definition. I have added myself as a contact for testing purpose. Please feel free to remove it, if required.

patch to fix the wrong template usage.
builders_nagiosv2.patch

remove builders from hostgroups server and nomail
builders_nagiosv4.patch

Done!

Thanks for all the work.

We will likely adjust this moving forward, but it should be pretty good for now.

Login to comment on this ticket.

Metadata