#5453 add monitoring for mirrorlist servers dropping out of haproxy
Closed: Fixed None Opened 7 years ago by kevin.

Sometimes our mirrorlist servers stop processing requests. This isn't a great problem as haproxy just stops sending them things, but we would like to know so we can fix them.

So this could just monitor haproxy on say proxy01 and look for mirrorlists dropping off, or perhaps it could use the same url that haproxy uses to see if a mirrorlist is 'alive' and just directly check them all.


nagios-nrpe check script (check_mirrorlist.py) ready for testing on proxy01.stg (placed under /tmp/)

fixed and tested, with help from smooge, on proxy01.stg and proxy01

Attaching for review.

next: nagios changes and then committing to ansible git

changes are ready for commit and push.

below is diff against "touched" nagios-role files, and new added files

{{{
diff --git a/roles/nagios/client/tasks/main.yml b/roles/nagios/client/tasks/main.yml
index 7d1651d..60d38ff 100644
--- a/roles/nagios/client/tasks/main.yml
+++ b/roles/nagios/client/tasks/main.yml
@@ -41,6 +41,7 @@
copy: src="scripts/{{ item }}" dest="{{ libdir }}/nagios/plugins/{{ item }}" mode=0755 owner=nagios group=nagios
with_items:
- check_haproxy_conns.py
+ - check_haproxy_mirrorlist.py
- check_postfix_queue
- check_raid.py
- check_lock
@@ -184,6 +185,7 @@
template: src={{ item }}.j2 dest=/etc/nrpe.d/{{ item }}
with_items:
- check_happroxy_conns.cfg
+ - check_happroxy_mirrorlist.cfg
- check_varnish_proc.cfg
when: inventory_hostname.startswith('proxy')
notify:
diff --git a/roles/nagios/server/files/nrpe.cfg b/roles/nagios/server/files/nrpe.cfg
index 752bca5..07c4593 100644
--- a/roles/nagios/server/files/nrpe.cfg
+++ b/roles/nagios/server/files/nrpe.cfg
@@ -237,6 +237,7 @@ command[check_fedmsg_tweet_proc]=/usr/lib64/nagios/plugins/check_procs -c 1:1 -C
command[check_fedmsg_masher_proc]=/usr/lib64/nagios/plugins/check_procs -c 1:1 -C 'fedmsg-hub' -u apache
command[check_supybot_fedmsg_plugin]=/usr/lib64/nagios/plugins/check_supybot_plugin -t fedmsg
command[check_haproxy_conns]=/usr/lib64/nagios/plugins/check_haproxy_conns.py
+command[check_haproxy_mirrorlist]=/usr/lib64/nagios/plugins/check_haproxy_mirrorlist.py
command[check_redis_proc]=/usr/lib64/nagios/plugins/check_procs -c 1:1 -C 'redis-server' -u redis
command[check_autocloud_proc]=/usr/lib64/nagios/plugins/check_procs -c 1:1 -C 'python' -a 'autocloud_job.py' -u root
command[check_openvpn_link]=/usr/lib64/nagios/plugins/check_ping -H 192.168.1.41 -w 375.0,20% -c 500,60%

}}}

{{{

roles/nagios/client/files/scripts/check_haproxy_mirrorlist.py

roles/nagios/client/templates/check_happroxy_mirrorlist.cfg.j2

roles/nagios/server/files/nagios/services/haproxy_mirrorlist.cfg

roles/nagios/server/files/plugins/check_haproxy_mirrorlist.py

}}}

Changed pushed and tested on noc01 for proxy01. tested also with shutting one of the mirrorlist servers. thanks to smooge and nirik

Also, added the check to proxy04

Login to comment on this ticket.

Metadata