Re: Post-2018 messages in archives

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Post-2018 messages in archives
Date: 2018-12-03 09:08:20
Message-ID: CABUevEyOB-hvB559cqC=CdH9Bt6Wj48c_0TMytw4P0CQACTe0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Mon, Dec 3, 2018 at 2:40 AM Noah Misch <noah(at)leadboat(dot)com> wrote:

> At some point in the last few months, the archives of many mailing lists
> added
> messages dated far in the future. For example, pgsql-hackers archives
> gained
> four messages from years 2030, 2032 and 2036:
>
> https://www.postgresql.org/list/pgsql-hackers/since/203011010000/
>
> This disrupts my use of the "Next" link. If you're looking at the last
> page
> of messages and click "Next", you'll get a page with just the latest one
> message. Normally, if you refresh that page later, you'll see messages
> added
> after you clicked "Next". With the far-future messages in there, "Next"
> brings one to
> https://www.postgresql.org/list/pgsql-hackers/since/203602080620
> which won't get new messages regularly for another 18 years.
>
> Perhaps the fix is to set the archive date to the archives ingest time when
> the message asserts a date substantially (15min?) earlier or later. Would
> that be an improvement?
>
>
I wonder what caused this. I did a full reparse of the archives last week.
I wonder if this caused it, and that we actually had this problem before
but we cleaned it up manually at some point, and this manual cleanup got
overwritten by this reparse.H

Unfortunately we don't keep the ingest time separately. But for the future,
doing so would probably be a good idea, for other reasons as well. I think
15 minutes might be pushing it a bit given the kind of times we see around,
in particular with incorrectly configured timezones. But something like 24h
would probably work.

Luckily, it's not too terribly bad:

archives=# select count(*) from messages where date > now();
count
-------
10
(1 row)

(out of about 1.3M messages).

So short-term I will go process those messages manually.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Dave Page 2018-12-03 10:02:08 Re: new book to add to the books page
Previous Message Noah Misch 2018-12-03 01:40:10 Post-2018 messages in archives