Patch #3754

add some additional URL paths to robots.txt

Added by mark burdett about 11 years ago. Updated over 5 years ago.

Status:NewStart date:2009-08-18
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

My apache logs show that some redmine URLs are being heavily indexed by robots, and it seems like it would be best to have them blocked by robots.txt:
/issues
/projects/*/time_entries
/projects/N/wiki/* (where N is the numeric project id)
/repositories/annotate/*
/repositories/browse/*
/repositories/changes/*
/repositories/diff/*
/repositories/entry/*

robots.txt.patch Magnifier (587 Bytes) mark burdett, 2009-09-23 11:18

robots.txt.patch Magnifier (597 Bytes) mark burdett, 2009-10-01 01:38

robots.txt-2.patch Magnifier (554 Bytes) Antoine Beaupré, 2013-03-11 22:30


Related issues

Related to Redmine - Defect #6734: robots.txt: disallow crawling issues list with a query st... Closed 2010-10-24

History

#1 Updated by Mischa The Evil about 11 years ago

See the Bots Filter plugin which has some overlap (e.g. the repositories). Maybe you can modify it to adapt it to your precise requirements?

Regards,

Mischa.

#2 Updated by mark burdett about 11 years ago

Or I can easily block these via my apache config. But I do think they should be added to robots.txt by default. I also wonder, how are Googlebot and others even finding some of these non-canonical paths? It could point to a bug elsewhere which is generating links to these paths?

#3 Updated by mark burdett about 11 years ago

Here's a patch adding the additional problematic paths to the default robots.txt

#4 Updated by Eric Davis about 11 years ago

I like having the robots crawl some of these pages, they even turn up when I'm searching for a bug that I've already fixed.

  • wiki pages
  • global issues list
  • repositories

#5 Updated by mark burdett about 11 years ago

The wiki pages that this patch blocks are not the canonical path, they use the numeric project id rather than project name.

I now realize that the initial version of this patch blocked the individual issue pages; I intended to only block /issues? -- i.e. the global issue search page.

#6 Updated by Jean-Philippe Lang almost 11 years ago

  • Tracker changed from Defect to Patch

#7 Updated by Brad Schick over 10 years ago

My site is also getting hammer on /repositories and /issues. Seems somewhat pointless to disallow access to these resources through /projects/... but not other urls.

#8 Updated by Antoine Beaupré over 7 years ago

This patch has been ready for more than 3 years, why hasn't this been committed yet?

#9 Updated by Antoine Beaupré over 7 years ago

Here's an updated patch for 1.4.

#10 Updated by Antoine Beaupré over 5 years ago

Antoine Beaupré wrote:

Here's an updated patch for 1.4.

and that was now two years ago, with the patch sitting here for 5 years. can we at least get feedback on what's wrong with the patch, if anything?

thanks.

Also available in: Atom PDF