#5987 Improve pagure.io monitoring and performance
Closed: Upstream 6 years ago Opened 7 years ago by sochotni.

It seems pagure.io is slowing down and occasionally even hitting http 500 errors. While it seems to recover I am not sure there's enough monitoring set up for this service. Can you review and see if something is missing?

Thanks


My suspicion is that we have some long running queries that are taking the load on the machine. I've seen it going to 15 and above and I received a few MemoryError by email.
These would lead to the 500 errors you see, this may also be part of the issue with the slowing down. For this I wonder if moving the DB to its own host may gain us some breathing room.

We actually do have monitoring... and it's been going off a lot. :broken_heart:

There are ongoing efforts to make things better upstream (improving queries, etc).

I think the next time it gets slow we need to investigate again.

I'm not sure it makes sense to keep this ticket open without any actual actions to be taken, perhaps there's an upstream ticket we could point to tracking performance?

I'll leave it up to you. Wasn't sure there was a ticket anywhere for it so I filed this one. Feel free to close/move as you see fit.

We actually do have monitoring... and it's been going off a lot. 💔
There are ongoing efforts to make things better upstream (improving queries, etc).
I think the next time it gets slow we need to investigate again.
I'm not sure it makes sense to keep this ticket open without any actual actions to be taken, perhaps there's an upstream ticket we could point to tracking performance?

@kevin @pingou Has this been sorted out?

So, we have done tons of things with pagure over the last month to help with this. ;)

There is still one issue left: On occasion, multiple requests come in that hit very large git repos and all the threads handle those and new requests cannot be processed (so it looks like it's hung). Reloading httpd clears the issue or waiting about 5minutes.

We are hoping that some improvements that @puiterwijk is working on in libgit2 will fix this issue.

Otherwise responsiveness should be pretty good these days.

I'm going to close this for now, but if the libgit2 changes don't work we will need to revisit.

Metadata Update from @kevin:
- Issue close_status updated to: Upstream
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata