#8494 Add monitoring for httpd service on resultsdb
Closed: Fixed 4 years ago by kevin. Opened 5 years ago by mizdebsk.

httpd.service on resultsdb01.qa.fedoraproject.org host crashed and was down for more than 37 hours, yet we didn't get any alert about that. Monitoring of the service should be added to prevent such long outage from happening in the future.

Jan 01 01:43:23 resultsdb01.qa.fedoraproject.org systemd[1]: httpd.service: A process of this unit has been killed by the OOM killer.
Jan 01 01:43:58 resultsdb01.qa.fedoraproject.org systemd[1]: httpd.service: Failed with result 'oom-kill'.
Jan 01 01:43:58 resultsdb01.qa.fedoraproject.org systemd[1]: httpd.service: Consumed 1h 24min 33.687s CPU time.
Jan 02 14:51:45 resultsdb01.qa.fedoraproject.org systemd[1]: Starting The Apache HTTP Server...
Jan 02 14:51:45 resultsdb01.qa.fedoraproject.org httpd[812580]: [Thu Jan 02 14:51:45.939277 2020] [env:warn] [pid 812580:tid 140307113316672] AH01506: PassEnv variable HOSTNAME was undefined
Jan 02 14:51:46 resultsdb01.qa.fedoraproject.org httpd[812580]: Server configured, listening on: port 80
Jan 02 14:51:46 resultsdb01.qa.fedoraproject.org systemd[1]: Started The Apache HTTP Server.

the issue was reviewd ?

Oops. I totally missed the update on this one...

That looks like it should be ok, but the hostname has changed and so much of nagios changed it won't apply.

Can someone rebase it and use the new name (resultsdb01.iad2.fedoraproject.org) ?

Yep. that looks great. :)

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Log in to comment on this ticket.

Metadata