Re: Post-2018 messages in archives

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Post-2018 messages in archives
Date: 2018-12-06 12:26:15
Message-ID: CABUevEzPiDjcFvMWVgOjbXOnLC+nWnZW-0PRX7O_PKSxoJtKrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Thu, Dec 6, 2018 at 7:14 AM Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Wed, Dec 05, 2018 at 11:31:39PM -0500, Tom Lane wrote:
> > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > On Wed, Dec 05, 2018 at 09:39:18AM +0100, Magnus Hagander wrote:
> > >>> Unfortunately we don't keep the ingest time separately. But for the
> future,
> > >>> doing so would probably be a good idea, for other reasons as well.
> >
> > > Works for me. Pondering it more, the timestamp that matters most for
> archive
> > > purposes is the timestamp at which list subscribers started to receive
> their
> > > copies of the message. Based on that, I'm thinking we should ignore
> the Date
> > > header and always use the timestamp from a particular "Received ... by
> > > HOSTNAME.postgresql.org" header. Before settling on that, I'd want
> to check
> > > how many messages change timestamp by more than ~100s, and I'd want to
> spot
> > > check a few messages to see whether the change looks like an
> improvement.
> >
> > Another point worth considering here is moderation queue delays, which
> > are not infrequently measured in days :-(. I am not quite sure whether
> > it'd be better to tag a moderation-delayed message with the timestamp
> > when it entered the queue or the time when it exited. But either one
> > would be better than believing the Date: header.
>
> Good point. I'd prefer to use the time when it exited the queue, which
> conforms to "timestamp at which list subscribers started to receive their
> copies of the message" mentioned above. I usually download November's
> mbox in
> the first few days of December. If we use the timestamp of entering the
> queue
> (or the Date header), there's no particular upper bound on when the
> November
> mbox stops accruing new messages.
>

Given that this has happened 10 times across 1.25 million messages, I
really can't get excited about building any form of complicated solution
for it.. :)

So for this, just using the automatic timestamp assigned to the row when it
enteres the archives should do. Normally it will only differ a second or a
few compared to the suggestions above, and it would only grow to something
bigger if the archives server was temporarily down or there were other
delivery issues.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Jonathan S. Katz 2018-12-06 14:25:20 Re: Dropping training events
Previous Message Noah Misch 2018-12-06 06:14:18 Re: Post-2018 messages in archives