From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Flush pgstats file during checkpoints |
Date: | 2024-06-18 06:01:12 |
Message-ID: | ZnEiqAITL-VgZDoY@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
On HEAD, xlog.c has the following comment, which has been on my own
TODO list for a couple of weeks now:
* TODO: With a bit of extra work we could just start with a pgstat file
* associated with the checkpoint redo location we're starting from.
Please find a patch series to implement that, giving the possibility
to keep statistics after a crash rather than discard them. I have
been looking at the code for a while, before settling down on:
- Forcing the flush of the pgstats file to happen during non-shutdown
checkpoint and restart points, after updating the control file's redo
LSN and the critical sections in the area.
- Leaving the current before_shmem_exit() callback around, as a matter
of delaying the flush of the stats for as long as possible in a
shutdown sequence. This also makes the single-user mode shutdown
simpler.
- Adding the redo LSN to the pgstats file, with a bump of
PGSTAT_FILE_FORMAT_ID, cross-checked with the redo LSN. This change
is independently useful on its own when loading stats after a clean
startup, as well.
- The crash recovery case is simplified, as there is no more need for
the "discard" code path.
- Not using a logic where I would need to stick a LSN into the stats
file name, implying that we would need a potential lookup at the
contents of pg_stat/ to clean up past files at crash recovery. These
should not be costly, but I'd rather not add more of these.
7ff23c6d277d, that has removed the last call of CreateCheckPoint()
from the startup process, is older than 5891c7a8ed8f, still it seems
to me that pgstats relies on some areas of the code that don't make
sense on HEAD (see locking mentioned at the top of the write routine
for example). The logic gets straight-forward to think about as
restart points and checkpoints always run from the checkpointer,
implying that pgstat_write_statsfile() is already called only from the
postmaster in single-user mode or the checkpointer itself, at
shutdown.
Attached is a patch set, with the one being the actual feature, with
some stuff prior to that:
- 0001 adds the redo LSN to the pgstats file flushed.
- 0002 adds an assertion in pgstat_write_statsfile(), to check from
where it is called.
- 0003 with more debugging.
- 0004 is the meat of the thread.
I am adding that to the next CF. Thoughts and comments are welcome.
Thanks,
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Add-redo-LSN-to-pgstats-file.patch | text/x-diff | 4.5 KB |
0002-Add-assertion-in-pgstat_write_statsfile.patch | text/x-diff | 1017 bytes |
0003-Add-some-DEBUG2-information-about-the-redo-LSN-of-th.patch | text/x-diff | 1.4 KB |
0004-Flush-pgstats-file-during-checkpoints.patch | text/x-diff | 9.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2024-06-18 06:03:54 | Re: Conflict Detection and Resolution |
Previous Message | Andrey M. Borodin | 2024-06-18 05:47:52 | Re: What is a typical precision of gettimeofday()? |