Re: no mailing list hits in google

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: no mailing list hits in google
Date: 2019-08-28 17:59:40
Message-ID: 23549.1567015180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> It blocks /list/ which has the subjects only. The actual emails in
> /message-id/ are not blocked by robots.txt. I don't know why they stopped
> appearing in the searches... Nothing has been changed around that for many
> years from *our* side.

If I go to

https://www.postgresql.org/message-id/

I get a page saying "Not Found". So I'm not clear on how a web crawler
would descend through that to individual messages.

Even if it looks different to a robot, what would it look like exactly?
A flat space of umpteen zillion immediate-child pages? It seems not
improbable that Google's search engine would intentionally decide not to
index that, or unintentionally just fail due to some internal resource
limit. (This theory can explain why it used to work and no longer does:
we got past whatever the limit is.)

Andres' idea of allowing access to /list/ would allow the archives to be
traversed in more bite-size pieces, which might fix the issue.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-28 18:07:09 Re: RFC: seccomp-bpf support
Previous Message Andres Freund 2019-08-28 17:48:00 Re: no mailing list hits in google

Browse pgsql-www by date

  From Date Subject
Next Message Thomas Kellerer 2019-08-28 18:20:38 Re: no mailing list hits in google
Previous Message Andres Freund 2019-08-28 17:48:00 Re: no mailing list hits in google