stg.pagure.io is indexed by Google (and probably other search engines). We should add a robots.txt to make it not indexed by search engines.
Metadata Update from @puiterwijk: - Issue tagged with: easyfix
<img alt="stg.pagure.io.html" src="/fedora-infrastructure/issue/raw/48006c6f66fe4fb5ef3a5f60d719f0c734832eab682c9ab0777a42d95aae2319-stg.pagure.io.html" />
The robots.txt lgtm. Needs changes in ansible to make use of it.
robots.txt
Actually, I lied, sorry. Per spec, there can't be a blank line between the User-Agent and Disallow.
User-Agent
Disallow
<img alt="robots.txt" src="/fedora-infrastructure/issue/raw/133e4db054e73a10017a1f429c80c35cd5bfa9c3a1aba581b364ecc459c48a4b-robots.txt" />
I can look into making the changes in Ansible to use this. Can someone point me in the right direction for the next step please?
Disclaimer - I'm an apprentice and just started looking at the Ansible layout/configuration.
The repo can be found at https://infrastructure.fedoraproject.org/cgit/ansible.git/ (as seen on current project overview page).
Looking at the pagure/frontend role, I'd convert the robots.txt task from file to template. In the template, I'd use the 'pagure-staging' group_var to define the disallow '/' for staging and leave current disallow/crawl-delay for the rest.
Please let me know if you are no longer working on this issue and I will make the changes.
Attached is a git patch for this change. Changes made -
Without (much) access, I wasn't able to test. Please let me know if anything else is needed. <img alt="0001-disable-indexing-of-pagure-staging-converted-robots..patch" src="/fedora-infrastructure/issue/raw/ced52b3dfbf4d29f0ecc7948429e6c31ec25c5f044836fd3481f108f60c55818-0001-disable-indexing-of-pagure-staging-converted-robots..patch" />
Can I get a patch review?
Looks good! Sorry for the delay.
I've pushed this in...
:guitar:
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.