#10541 Nagios checks for pagure.io
Closed: Fixed 2 years ago by aheath1992. Opened 2 years ago by praiskup.

Happened several times last days I had to ping folks on #fedora-noc or #fedora-infra for
restarting pagure.io.

The thing is that we can not fill infra issues when pagure.io is down. :-)

But I was curious that, even though all the pages were giving 503 errors,
no Nagios reports were on #fedora-noc. Perhaps something is misconfigured?


I looked at the nagios and it seems we are only watching the staging instance of pagure.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-trouble, medium-gain, ops

2 years ago

We are monitoring the prod one as well, it's just called by the internal name: pagure02.fedoraproject.org

Can you share what url(s) you were hitting that were throwing 503s? ie, what exactly should we be monitoring here? perhaps just https://pagure.io/fedora-infrastructure/issues ?

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Reporter (was: Waiting on Assignee)

2 years ago

Metadata Update from @aheath1992:
- Issue assigned to aheath1992

2 years ago

I have tested on my lab Nagios, and I am able to get the status code for https://pagure.io/fedora-infrastructure/issues and have that monitored. What repo or job to I need to make a Pull Request on to add the changes?

Our ansible repo: https://pagure.io/fedora-infra/ansible

under roles/nagios_server/

possibly look at roles/nagios_server/templates/nagios/services/websites.cfg.j2 for how other sites are checked...

Metadata Update from @aheath1992:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog