#910 Setup monitoring for gencert lockfile
Closed: Fixed None Opened 15 years ago by toshio.

Our current certificate infrastructure in FAS requires that we only create one certificate at a time. So we have a lock directory to keep multiple threads/processes from trying to create certificates simultaneously.

The lock directory is fas1:/var/loc/fedora-ca/lock

We need to monitor this lock dir and make sure that it doesn't exist with the same timestamp for a long period of time. If that happens, it probably means that something killed the process which holds the lock and we need to clean up manually.

Cleanup steps are:

{{{
ssh fas1
/etc/init.d/httpd stop
rmdir /var/lock/fedora-ca/lock
/etc/init.d/httpd/start
}}}


Can you please tell us what is the period of time that it should works fine? We need the limit to configure the threshold.

It be very fast (definitely under 5 minutes; likely under 10 seconds).

We don't need to check at that frequency but if the lock file is older than that then we have something wrong.

This check is already configured:

https://admin.fedoraproject.org/nagios/cgi-bin//extinfo.cgi?type=2&host=fas01&service=Check+certificate+lock

command[check_lock_file_age]=/usr/lib/nagios/plugins/check_lock_file_age -w 1 -c 5 -f /var/lock/fedora-ca/lock

define service {
host_name fas01
service_description Check certificate lock
check_command check_by_nrpe!check_lock_file_age
use defaulttemplate
}

Login to comment on this ticket.

Metadata