From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org> |
Subject: | Re: no mailing list hits in google |
Date: | 2019-08-28 17:59:40 |
Message-ID: | 23549.1567015180@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-www |
Magnus Hagander <magnus(at)hagander(dot)net> writes:
> It blocks /list/ which has the subjects only. The actual emails in
> /message-id/ are not blocked by robots.txt. I don't know why they stopped
> appearing in the searches... Nothing has been changed around that for many
> years from *our* side.
If I go to
https://www.postgresql.org/message-id/
I get a page saying "Not Found". So I'm not clear on how a web crawler
would descend through that to individual messages.
Even if it looks different to a robot, what would it look like exactly?
A flat space of umpteen zillion immediate-child pages? It seems not
improbable that Google's search engine would intentionally decide not to
index that, or unintentionally just fail due to some internal resource
limit. (This theory can explain why it used to work and no longer does:
we got past whatever the limit is.)
Andres' idea of allowing access to /list/ would allow the archives to be
traversed in more bite-size pieces, which might fix the issue.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-08-28 18:07:09 | Re: RFC: seccomp-bpf support |
Previous Message | Andres Freund | 2019-08-28 17:48:00 | Re: no mailing list hits in google |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2019-08-28 18:20:38 | Re: no mailing list hits in google |
Previous Message | Andres Freund | 2019-08-28 17:48:00 | Re: no mailing list hits in google |