From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: Post-2018 messages in archives |
Date: | 2018-12-06 12:26:15 |
Message-ID: | CABUevEzPiDjcFvMWVgOjbXOnLC+nWnZW-0PRX7O_PKSxoJtKrQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
On Thu, Dec 6, 2018 at 7:14 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Wed, Dec 05, 2018 at 11:31:39PM -0500, Tom Lane wrote:
> > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > On Wed, Dec 05, 2018 at 09:39:18AM +0100, Magnus Hagander wrote:
> > >>> Unfortunately we don't keep the ingest time separately. But for the
> future,
> > >>> doing so would probably be a good idea, for other reasons as well.
> >
> > > Works for me. Pondering it more, the timestamp that matters most for
> archive
> > > purposes is the timestamp at which list subscribers started to receive
> their
> > > copies of the message. Based on that, I'm thinking we should ignore
> the Date
> > > header and always use the timestamp from a particular "Received ... by
> > > HOSTNAME.postgresql.org" header. Before settling on that, I'd want
> to check
> > > how many messages change timestamp by more than ~100s, and I'd want to
> spot
> > > check a few messages to see whether the change looks like an
> improvement.
> >
> > Another point worth considering here is moderation queue delays, which
> > are not infrequently measured in days :-(. I am not quite sure whether
> > it'd be better to tag a moderation-delayed message with the timestamp
> > when it entered the queue or the time when it exited. But either one
> > would be better than believing the Date: header.
>
> Good point. I'd prefer to use the time when it exited the queue, which
> conforms to "timestamp at which list subscribers started to receive their
> copies of the message" mentioned above. I usually download November's
> mbox in
> the first few days of December. If we use the timestamp of entering the
> queue
> (or the Date header), there's no particular upper bound on when the
> November
> mbox stops accruing new messages.
>
Given that this has happened 10 times across 1.25 million messages, I
really can't get excited about building any form of complicated solution
for it.. :)
So for this, just using the automatic timestamp assigned to the row when it
enteres the archives should do. Normally it will only differ a second or a
few compared to the suggestions above, and it would only grow to something
bigger if the archives server was temporarily down or there were other
delivery issues.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2018-12-06 14:25:20 | Re: Dropping training events |
Previous Message | Noah Misch | 2018-12-06 06:14:18 | Re: Post-2018 messages in archives |