Before we move Taskotron to production, we need to have some monitoring in place so that there are notifications if/when stuff starts going down.
Determine what needs to be monitored and make sure that the various monitoring systems (nagios, most likely) are updated.
The things that come to mind for monitoring are: http ping for: * taskotron master landing page * resultsdb_frontend main page * restultsdb landing page ping for * taskotron-client * taskotron-[dev,stg,prod] * resultsdb-[dev,stg,prod] * qa-db01.qa? client status * could be done by scraping the client status page from master, json api from master or commands sent to each client
wishlist: * make sure that trigger is working (not sure how to do that in a sane fashion)
There is a nagios plugin in the upstream buildbot repo: https://github.com/buildbot/buildbot/blob/master/master/contrib/check_buildbot.py
monitoring setup ticket filed with fedora infra: https://fedorahosted.org/fedora-infrastructure/ticket/4541
The montioring setup ticket was closed a while back - closing this
Metadata Update from @tflink: - Issue tagged with: infrastructure
Login to comment on this ticket.