From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: robots.txt on git.postgresql.org |
Date: | 2013-07-11 13:50:58 |
Message-ID: | 20130711135058.GG27898@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-07-11 14:43:21 +0100, Greg Stark wrote:
> On Wed, Jul 10, 2013 at 9:36 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > We already run this, that's what we did to make it survive at all. The
> > problem is there are so many thousands of different URLs you can get
> > to on that site, and google indexes them all by default.
>
> There's also https://support.google.com/webmasters/answer/48620?hl=en
> which lets us control how fast the Google crawler crawls. I think it's
> adaptive though so if the pages are slow it should be crawling slowly
The problem is that gitweb gives you access to more than a million
pages...
Revisions: git rev-list --all origin/master|wc -l => 77123
Branches: git branch --all|grep origin|wc -
Views per commit: commit, commitdiff, tree
So, slow crawling isn't going to help very much.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2013-07-11 14:05:59 | Re: robots.txt on git.postgresql.org |
Previous Message | Greg Stark | 2013-07-11 13:43:21 | Re: robots.txt on git.postgresql.org |