While investigating greenwave's 502 response to bodhi, I found out that greenwave receives a lot of requests from goolgebot (~ 10,000 in 24h).
Since greenwave runs in OpenShift I am not sure what is the best way to add the robots.txt. One way to do it is to handle it directly in Greenwave, otherwise would it be possible to manage this from the proxies ?
When do you need this? (YYYY/MM/DD)
When is this no longer needed or useful? (YYYY/MM/DD)
If we cannot complete your request, what is the impact?
Currently the proxies are set up with the following robots.txt
[root@proxy01 conf.d][PROD]# less /srv/web/greenwave.fedoraproject.org-robots.txt User-agent: * Disallow: /
However this looks like it doesn't work because the setup is: ProxyPass / "balancer://app-os/" ProxyPassReverse / "balancer://app-os/"
so everything goes to the backend node servers without the front end giving a robots.txt file. I am not sure which part is by design and which is by accident..
Metadata Update from @smooge: - Issue assigned to smooge
Opened a ticket in greenwave for consideration --> https://pagure.io/greenwave/issue/402
Metadata Update from @bowlofeggs: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: greenwave
OK I think I have found a solution, but I am not sure if it will break other sites. I have sent a pre-FBR to get people to review before I try it.
It looks like this is live:
https://greenwave.fedoraproject.org/robots.txt
User-agent: * Disallow: /
:green_book:
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Metadata Update from @cverna: - Issue status updated to: Open (was: Closed)
I think the problem is with https://greenwave.app.os.fedoraproject.org/robots.txt
For historical reason we have 2 routes for greenwave, we could look at removing the old one, but I think this is the route used by other services for example bodhi.
So before this change, neither of those entry points had a robots.txt and now both of them do. I would see if the googlebots honor the robots.txt and if they don't we will add a block for them on app.os
Ok so I ll close this and check if that worked or not in few days :smile:
Thanks
Metadata Update from @cverna: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.