From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: robots.txt on git.postgresql.org |
Date: | 2013-07-10 08:36:06 |
Message-ID: | CABUevEyUM-CEmmBcHmX6VrnkHj8O7xYk6ZvfdSfk-T8O4jd-Vw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 10, 2013 at 10:25 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 07/09/2013 11:30 PM, Andres Freund wrote:
>> On 2013-07-09 16:24:42 +0100, Greg Stark wrote:
>>> I note that git.postgresql.org's robot.txt refuses permission to crawl
>>> the git repository:
>>>
>>> http://git.postgresql.org/robots.txt
>>>
>>> User-agent: *
>>> Disallow: /
>>>
>>>
>>> I'm curious what motivates this. It's certainly useful to be able to
>>> search for commits.
>>
>> Gitweb is horribly slow. I don't think anybody with a bigger git repo
>> using gitweb can afford to let all the crawlers go through it.
>
> Wouldn't whacking a reverse proxy in front be a pretty reasonable
> option? There's a disk space cost, but using Apache's mod_proxy or
> similar would do quite nicely.
We already run this, that's what we did to make it survive at all. The
problem is there are so many thousands of different URLs you can get
to on that site, and google indexes them all by default.
It's before we had this that the side regularly died.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Jeevan Chalke | 2013-07-10 11:31:23 | Regex pattern with shorter back reference does NOT work as expected |
Previous Message | Dave Page | 2013-07-10 08:35:24 | Re: robots.txt on git.postgresql.org |