Issue #1848: /robots.txt on fedorahosted.org and git.fedorahosted.org - fedora-infrastructure

fedora-infrastructure

#1848 /robots.txt on fedorahosted.org and git.fedorahosted.org

Closed: Fixed None Opened 14 years ago by mmcgrath.

Can we get a proper robots.txt deployed on fedorahosted. We're hitting several performance issues and until we can figure out a plan for how to deal with them I'd like to restrict google a bit. Specifically I want to disallow:

https://fedorahosted.org/*/browser/*

We may need to use mod_rewrite based on user agent to force this, not sure. I'm thinking about doing the same for git-web's snapshot for crawlers.

ke4qqq commented 14 years ago

attachment
fh.o-robots.txt.patch

ke4qqq commented 14 years ago

I know we are currently frozen, but this looks relatively simple, and can be applied once we are unfrozen.

I am a newb to fedora's puppet practices, so I attached a patch for review/comment/flame :)

As for git-web stuff- I notice some of the other F/LOSS git-web instances are disallowing everything with robots.txt,
See:
http://git.kernel.org/robots.txt
http://git.postgresql.org/robots.txt

Thanks