From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Josh Kupershmidt <schmiddy(at)gmail(dot)com> |
Cc: | "w^3" <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: robots.txt sometimes disallowing all? |
Date: | 2014-06-24 14:15:13 |
Message-ID: | CABUevEyGoz59TfKNAa4jsMKzLr9dSDAgTnHM_RvC6W1kaa5vwA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
Thanks for the diagnostics! I was expecting it was something like that, but
somehow managed to misplace your original report and therefor didn't
investigate it further.
I will take a look at it tonight.
//Magnus
On Tue, Jun 24, 2014 at 4:05 PM, Josh Kupershmidt <schmiddy(at)gmail(dot)com>
wrote:
> This behavior seems to still be going on, but I think I have a clue. I
> noticed while experimenting with:
>
> wget -O robots.txt http://www.postgresql.org/robots.txt && cat robots.txt
>
> that wget tells me the available servers for www.postgresql.org it has
> found in DNS:
>
> Resolving www.postgresql.org... 87.238.57.232, 217.196.149.50,
> 174.143.35.230
>
> When I fall to 217.196.149.50 and 87.238.57.232, I get the normal
> robots.txt. When I fall to 174.143.35.230, I get the bad version
> disallowing all access to the site. BTW, this behavior seems to not be
> dependent on the user-agent string, contrary to my earlier
> speculation. Could someone please check out what's going on with
> robots.txt on 174.143.35.230, as it seems to seriously be screwing
> with our Google search results.
>
> Josh
>
> On Wed, Jun 18, 2014 at 9:26 AM, Josh Kupershmidt <schmiddy(at)gmail(dot)com>
> wrote:
> > I noticed an unusual search result shown as the top result by Google
> > (search query "POSTGRESQL DROP TRIGGER", first result for me leads to
> > www.postgresql.org/docs/8.3/static/sql-droptrigger.html ). The title
> > of the result is somehow "英語 - PostgreSQL", and below that title
> > reads: "A description for this result is not available because of this
> > site's robots.txt – learn more."
> >
> > Sure enough, when I checked http://www.postgresql.org/robots.txt in
> > Chrome on OS X, I see:
> >
> > User-agent: *
> > Disallow: /
> >
> > though when I check in other browsers (Safari, wget), I see a more
> > reasonable robots.txt:
> >
> > ===
> > User-agent: *
> > Disallow: /admin/
> > Disallow: /account/
> > Disallow: /docs/devel/
> > Disallow: /list/
> > Disallow: /search/
> > Disallow: /message-id/raw/
> > Disallow: /message-id/flat/
> >
> > Sitemap: http://www.postgresql.org/sitemap.xml
> > ===
> >
> > Is it intentional that we're serving up that first robots.txt to
> > (apparently) Googlebot and Chrome?
> >
> > Josh
>
>
> --
> Sent via pgsql-www mailing list (pgsql-www(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-www
>
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2014-06-24 16:01:02 | Re: robots.txt sometimes disallowing all? |
Previous Message | Josh Kupershmidt | 2014-06-24 14:05:39 | Re: robots.txt sometimes disallowing all? |