#8209 Make nagios check_ssl_cert checks happen from one server instead of from all proxies
Closed: Fixed 2 months ago by kevin. Opened 4 months ago by codeblock.

In some cases, when a cert is about to expire, we get an alert about it from every single proxy, which is excessive. This notably happens with the checks in nagios_server/files/nagios/services/ssl.cfg which have hostgroup_name proxies.

These should all just be listed under one server, not all the proxies for our sanities.

This is a relatively simple change and would make a good easyfix for someone looking to make one of their first patches.

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

4 months ago

This is easy enough to change, but... one reason we wanted all the proxies checked was in case we rolled out a new cert and somehow some subset of them didn't update and still had the old cert.

ie, we could just check one, but then we might miss if there's others that are effected.

Not sure whats best here...

@codeblock you still want to try and do this some different way?

I wonder if we could make them all depend on each other as far as nagios knows and the other ones would all be deps of the first one in the case where they are all bad, but only one would alert in the case of one bad?

I'll take a look, lets see what I can come up with

I wonder if this will work. It checks the certs on koji.fp.o and proxy03.fp.o and suppresses warnings of the koji and proxies hostgroups when this check already returns WARN or CRIT. pagure as a single host can send out notifications whenever necessary.
caveat: we won't get notifications about ssl certs from koji and proxies hostgroups until the certs on koji.fp.o and proxie03.fp.o get fixed.

diff --git a/roles/nagios_server/files/nagios/services/ssl.cfg b/roles/nagios_server/files/nagios/services/ssl.cfg
index 275571cc9..b857c8fd2 100644
--- a/roles/nagios_server/files/nagios/services/ssl.cfg
+++ b/roles/nagios_server/files/nagios/services/ssl.cfg
@@ -39,3 +39,19 @@ define service {
check_command check_ssl_cert!pagure.io!60
use defaulttemplate
+define servicedependency {
+ host_name koji.fedoraproject.org
+ dependent_hostgroup_name koji
+ dependent_service_description koji hosts running SSL checks
+ notification_failure_criteria w,c
+ execution_failure_criteria w,c
+define servicedependency {
+ host_name proxy03.fedoraproject.org
+ dependent_hostgroup_name proxies
+ dependent_service_description Proxies running SSL checks
+ notification_failure_criteria w,c
+ execution_failure_criteria w,c

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 months ago

Login to comment on this ticket.