There were reports of 500's from src.fedoraproject.org and on investigation it looks like a staging bugzilla-sync-toddler was stuck downloading the retired json files over and over again:
ansible -a 'grep 10.3.166.119 /var/log/httpd/src.fedoraproject.org-access.log | wc -l' -m shell proxy01.iad\*:proxy10.iad\* proxy10.iad2.fedoraproject.org | CHANGED | rc=0 >> 408158 proxy01.iad2.fedoraproject.org | CHANGED | rc=0 >> 358202
I blocked it on proxy01/10 to get things back to normal.
Questions:
Why is the stg poddler talking to production? shouldn't it use src.stg.fedoraproject.org?
Why is is redownloading over and over the same json files. ;(
I see prod is doing this too... ;(
It seems like its getting a bunch of 502's from pagure?
Metadata Update from @abompard: - Issue assigned to abompard
Looks like the dist-git URL is hardcoded in todders: https://pagure.io/fedora-infra/toddlers/blob/main/f/toddlers/utils/pagure.py#_1120
I don't know but it doesn't sound right indeed, I'll add some caching.
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: Needs investigation, high-gain, medium-trouble, ops
The caching is in place in prod (since a couple days ago), there should be way less requests. Also, I've removed the hardcoding of src.fp.o, staging is now hitting staging and not prod:
# ansible -a 'grep 10.3.166.119 /var/log/httpd/src.fedoraproject.org-access.log | wc -l' -m shell proxy01.iad\*:proxy10.iad\* proxy10.iad2.fedoraproject.org | CHANGED | rc=0 >> 0 proxy01.iad2.fedoraproject.org | CHANGED | rc=0 >> 0 # ansible -a 'grep 10.3.166.119 /var/log/httpd/src.fedoraproject.org-access.log | wc -l' -m shell proxy01.stg.iad\* proxy01.stg.iad2.fedoraproject.org | CHANGED | rc=0 >> 213
Adding cache can lead to bugs and issues when the cache is stale, so please keep an eye open too for things not working around toddlers.
I think the caching has solved this.
Please reopen if there's still any issues noted.
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.