Quick Links

Re: robots.txt on git.postgresql.org

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: robots.txt on git.postgresql.org
Date:	2013-07-11 14:05:59
Message-ID:	CABUevEy9pS6ERtg3xqzo31wv_93br=AzEHfbeM5m4kidypgTRA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jul 11, 2013 at 3:43 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Wed, Jul 10, 2013 at 9:36 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> We already run this, that's what we did to make it survive at all. The
>> problem is there are so many thousands of different URLs you can get
>> to on that site, and google indexes them all by default.
>
> There's also https://support.google.com/webmasters/answer/48620?hl=en
> which lets us control how fast the Google crawler crawls. I think it's
> adaptive though so if the pages are slow it should be crawling slowly

Sure, but there are plenty of other search engines as well, not just
google... Google is actually "reasonably good" at scaling back it's
own speed, in my experience. Which is not true of all the others. Of
course, it's also got the problem of it then taking a long time to
actually crawl the site, since there are so many different URLs...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Re: robots.txt on git.postgresql.org at 2013-07-11 13:43:21 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Claudio Freire	2013-07-11 14:11:50	Re: SSL renegotiation
Previous Message	Andres Freund	2013-07-11 13:50:58	Re: robots.txt on git.postgresql.org