Re: Flush pgstats file during checkpoints

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Flush pgstats file during checkpoints
Date: 2024-06-29 21:13:04
Message-ID: 41fb6458-7405-4b72-8226-b1d7f976dee5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/28/24 09:32, Konstantin Knizhnik wrote:
>
> On 18/06/2024 9:01 am, Michael Paquier wrote:
>> Hi all,
>>
>> On HEAD, xlog.c has the following comment, which has been on my own
>> TODO list for a couple of weeks now:
>>
>>       * TODO: With a bit of extra work we could just start with a
>> pgstat file
>>       * associated with the checkpoint redo location we're starting from.
>>
>> Please find a patch series to implement that, giving the possibility
>> to keep statistics after a crash rather than discard them.  I have
>> been looking at the code for a while, before settling down on:
>> - Forcing the flush of the pgstats file to happen during non-shutdown
>> checkpoint and restart points, after updating the control file's redo
>> LSN and the critical sections in the area.
>> - Leaving the current before_shmem_exit() callback around, as a matter
>> of delaying the flush of the stats for as long as possible in a
>> shutdown sequence.  This also makes the single-user mode shutdown
>> simpler.
>> - Adding the redo LSN to the pgstats file, with a bump of
>> PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN.  This change
>> is independently useful on its own when loading stats after a clean
>> startup, as well.
>> - The crash recovery case is simplified, as there is no more need for
>> the "discard" code path.
>> - Not using a logic where I would need to stick a LSN into the stats
>> file name, implying that we would need a potential lookup at the
>> contents of pg_stat/ to clean up past files at crash recovery.  These
>> should not be costly, but I'd rather not add more of these.
>>
>> 7ff23c6d277d, that has removed the last call of CreateCheckPoint()
>> from the startup process, is older than 5891c7a8ed8f, still it seems
>> to me that pgstats relies on some areas of the code that don't make
>> sense on HEAD (see locking mentioned at the top of the write routine
>> for example).  The logic gets straight-forward to think about as
>> restart points and checkpoints always run from the checkpointer,
>> implying that pgstat_write_statsfile() is already called only from the
>> postmaster in single-user mode or the checkpointer itself, at
>> shutdown.
>>
>> Attached is a patch set, with the one being the actual feature, with
>> some stuff prior to that:
>> - 0001 adds the redo LSN to the pgstats file flushed.
>> - 0002 adds an assertion in pgstat_write_statsfile(), to check from
>> where it is called.
>> - 0003 with more debugging.
>> - 0004 is the meat of the thread.
>>
>> I am adding that to the next CF.  Thoughts and comments are welcome.
>> Thanks,
>> --
>> Michael
>
> Hi Michael.
>
> I am working mostly on the same problem - persisting pgstat state in
> Neon (because of separation of storage and compute it has no local files).
> I have two questions concerning this PR and the whole strategy for
> saving pgstat state between sessions.
>
> 1. Right now pgstat.stat is discarded after abnormal Postgres
> termination. And in your PR we are storing LSN in pgstat.staty to check
> that it matches checkpoint redo LSN. I wonder if having outdated version
> of pgstat.stat  is worse than not having it at all? Comment in xlog.c
> says: "When starting with crash recovery, reset pgstat data - it might
> not be valid." But why it may be invalid? We are writing it first to
> temp file and then rename. May be fsync() should be added here and
> durable_rename() should be used instead of rename(). But it seems to be
> better than loosing statistics. And should not add some significant
> overhead (because it happens only at shutdown). In your case we are
> checking LSN of pgstat.stat file. But once again - why it is better to
> discard file than load version from previous checkpoint?
>

I think those are two independent issues - knowing that the snapshot is
from the last checkpoint, and knowing that it's correct (not corrupted).
And yeah, we should be careful about fsync/durable_rename.

> 2. Second question is also related with 1). So we saved pgstat.stat on
> checkpoint, then did some updates and then Postgres crashed. As far as I
> understand with your patch we will successfully restore pgstats.stat
> file. But it is not actually up-to-date: it doesn't reflect information
> about recent updates. If it was so critical to keep pgstat.stat
> up-to-date, then why do we allow to restore state on most recent
> checkpoint?
>

Yeah, I was wondering about the same thing - can't this mean we fail to
start autovacuum? Let's say we delete a significant fraction of a huge
table, but then we crash before the next checkpoint. With this patch we
restore the last stats snapshot, which can easily have

n_dead_tup | 0
n_mod_since_analyze | 0

for the table. And thus we fail to notice the table needs autovacuum.
AFAIK we run autovacuum on all tables with missing stats (or am I
wrong?). That's what's happening on replicas after switchover/failover
too, right?

It'd not be such an issue if we updated stats during recovery, but I
think think we're doing that. Perhaps we should, which might also help
on replicas - no idea if it's feasible, though.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-06-29 22:08:57 Re: Built-in CTYPE provider
Previous Message Tom Lane 2024-06-29 18:56:15 Re: pgsql: Add more SQL/JSON constructor functions