Patch #3754
open
add some additional URL paths to robots.txt
Added by mark burdett over 15 years ago.
Updated almost 10 years ago.
Description
My apache logs show that some redmine URLs are being heavily indexed by robots, and it seems like it would be best to have them blocked by robots.txt:
/issues
/projects/*/time_entries
/projects/N/wiki/* (where N is the numeric project id)
/repositories/annotate/*
/repositories/browse/*
/repositories/changes/*
/repositories/diff/*
/repositories/entry/*
Files
See the Bots Filter plugin which has some overlap (e.g. the repositories). Maybe you can modify it to adapt it to your precise requirements?
Regards,
Mischa.
Or I can easily block these via my apache config. But I do think they should be added to robots.txt by default. I also wonder, how are Googlebot and others even finding some of these non-canonical paths? It could point to a bug elsewhere which is generating links to these paths?
Here's a patch adding the additional problematic paths to the default robots.txt
I like having the robots crawl some of these pages, they even turn up when I'm searching for a bug that I've already fixed.
- wiki pages
- global issues list
- repositories
The wiki pages that this patch blocks are not the canonical path, they use the numeric project id rather than project name.
I now realize that the initial version of this patch blocked the individual issue pages; I intended to only block /issues? -- i.e. the global issue search page.
- Tracker changed from Defect to Patch
My site is also getting hammer on /repositories and /issues. Seems somewhat pointless to disallow access to these resources through /projects/... but not other urls.
This patch has been ready for more than 3 years, why hasn't this been committed yet?
Here's an updated patch for 1.4.
Antoine Beaupré wrote:
Here's an updated patch for 1.4.
and that was now two years ago, with the patch sitting here for 5 years. can we at least get feedback on what's wrong with the patch, if anything?
thanks.
Also available in: Atom
PDF