#7645 Greenwave gets a lot of requests from googlebot
Closed: Fixed 5 years ago by cverna. Opened 5 years ago by cverna.

  • Describe what you need us to do:

While investigating greenwave's 502 response to bodhi, I found out that greenwave receives a lot of requests from goolgebot (~ 10,000 in 24h).

Since greenwave runs in OpenShift I am not sure what is the best way to add the robots.txt. One way to do it is to handle it directly in Greenwave, otherwise would it be possible to manage this from the proxies ?

  • When do you need this? (YYYY/MM/DD)

  • When is this no longer needed or useful? (YYYY/MM/DD)

  • If we cannot complete your request, what is the impact?


Currently the proxies are set up with the following robots.txt

[root@proxy01 conf.d][PROD]# less /srv/web/greenwave.fedoraproject.org-robots.txt
User-agent: *
Disallow: /

However this looks like it doesn't work because the setup is:
ProxyPass / "balancer://app-os/"
ProxyPassReverse / "balancer://app-os/"

so everything goes to the backend node servers without the front end giving a robots.txt file. I am not sure which part is by design and which is by accident..

Metadata Update from @smooge:
- Issue assigned to smooge

5 years ago

Opened a ticket in greenwave for consideration --> https://pagure.io/greenwave/issue/402

Metadata Update from @bowlofeggs:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: greenwave

5 years ago

OK I think I have found a solution, but I am not sure if it will break other sites. I have sent a pre-FBR to get people to review before I try it.

It looks like this is live:

https://greenwave.fedoraproject.org/robots.txt

User-agent: *
Disallow: /

:green_book:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Metadata Update from @cverna:
- Issue status updated to: Open (was: Closed)

5 years ago

I think the problem is with https://greenwave.app.os.fedoraproject.org/robots.txt

For historical reason we have 2 routes for greenwave, we could look at removing the old one, but I think this is the route used by other services for example bodhi.

So before this change, neither of those entry points had a robots.txt and now both of them do. I would see if the googlebots honor the robots.txt and if they don't we will add a block for them on app.os

So before this change, neither of those entry points had a robots.txt and now both of them do. I would see if the googlebots honor the robots.txt and if they don't we will add a block for them on app.os

Ok so I ll close this and check if that worked or not in few days :smile:

Thanks

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata