Re: robots.txt sometimes disallowing all?

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: "w^3" <pgsql-www(at)postgresql(dot)org>
Subject: Re: robots.txt sometimes disallowing all?
Date: 2014-06-24 14:15:13
Message-ID: CABUevEyGoz59TfKNAa4jsMKzLr9dSDAgTnHM_RvC6W1kaa5vwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Thanks for the diagnostics! I was expecting it was something like that, but
somehow managed to misplace your original report and therefor didn't
investigate it further.

I will take a look at it tonight.

//Magnus

On Tue, Jun 24, 2014 at 4:05 PM, Josh Kupershmidt <schmiddy(at)gmail(dot)com>
wrote:

> This behavior seems to still be going on, but I think I have a clue. I
> noticed while experimenting with:
>
> wget -O robots.txt http://www.postgresql.org/robots.txt && cat robots.txt
>
> that wget tells me the available servers for www.postgresql.org it has
> found in DNS:
>
> Resolving www.postgresql.org... 87.238.57.232, 217.196.149.50,
> 174.143.35.230
>
> When I fall to 217.196.149.50 and 87.238.57.232, I get the normal
> robots.txt. When I fall to 174.143.35.230, I get the bad version
> disallowing all access to the site. BTW, this behavior seems to not be
> dependent on the user-agent string, contrary to my earlier
> speculation. Could someone please check out what's going on with
> robots.txt on 174.143.35.230, as it seems to seriously be screwing
> with our Google search results.
>
> Josh
>
> On Wed, Jun 18, 2014 at 9:26 AM, Josh Kupershmidt <schmiddy(at)gmail(dot)com>
> wrote:
> > I noticed an unusual search result shown as the top result by Google
> > (search query "POSTGRESQL DROP TRIGGER", first result for me leads to
> > www.postgresql.org/docs/8.3/static/sql-droptrigger.html ). The title
> > of the result is somehow "英語 - PostgreSQL", and below that title
> > reads: "A description for this result is not available because of this
> > site's robots.txt – learn more."
> >
> > Sure enough, when I checked http://www.postgresql.org/robots.txt in
> > Chrome on OS X, I see:
> >
> > User-agent: *
> > Disallow: /
> >
> > though when I check in other browsers (Safari, wget), I see a more
> > reasonable robots.txt:
> >
> > ===
> > User-agent: *
> > Disallow: /admin/
> > Disallow: /account/
> > Disallow: /docs/devel/
> > Disallow: /list/
> > Disallow: /search/
> > Disallow: /message-id/raw/
> > Disallow: /message-id/flat/
> >
> > Sitemap: http://www.postgresql.org/sitemap.xml
> > ===
> >
> > Is it intentional that we're serving up that first robots.txt to
> > (apparently) Googlebot and Chrome?
> >
> > Josh
>
>
> --
> Sent via pgsql-www mailing list (pgsql-www(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-www
>

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2014-06-24 16:01:02 Re: robots.txt sometimes disallowing all?
Previous Message Josh Kupershmidt 2014-06-24 14:05:39 Re: robots.txt sometimes disallowing all?