Re: Flush pgstats file during checkpoints

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Flush pgstats file during checkpoints
Date: 2024-07-23 03:52:11
Message-ID: Zp8o6_cl0KSgsnvS@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2024 at 07:01:41AM +0000, Bertrand Drouvot wrote:
> 1 ===
> Not related with your patch but this comment in the GetRedoRecPtr() function:
>
> * grabbed a WAL insertion lock to read the authoritative value in
> * Insert->RedoRecPtr
>
> sounds weird. Should'nt that be s/Insert/XLogCtl/?

No, the comment is right. We are retrieving a copy of
Insert->RedoRecPtr here.

> 2 ===
>
> + /* Write the redo LSN, used to cross check the file loaded */
>
> Nit: s/loaded/read/?

WFM.

> 3 ===
>
> + /*
> + * Read the redo LSN stored in the file.
> + */
> + if (!read_chunk_s(fpin, &file_redo) ||
> + file_redo != redo)
> + goto error;
>
> I wonder if it would make sense to have dedicated error messages for
> "file_redo != redo" and for "format_id != PGSTAT_FILE_FORMAT_ID". That would
> ease to diagnose as to why the stat file is discarded.

Yep. This has been itching me quite a bit, and that's a bit more than
just the format ID or the redo LSN: it relates to all the read_chunk()
callers. I've taken a shot at this with patch 0001, implemented on
top of the rest. Adjusted as well the redo LSN read to have more
error context, now in 0002.

> Looking at 0003:
>
> 4 ===
>
> @@ -5638,10 +5634,7 @@ StartupXLOG(void)
> * TODO: With a bit of extra work we could just start with a pgstat file
> * associated with the checkpoint redo location we're starting from.
> */
> - if (didCrash)
> - pgstat_discard_stats();
> - else
> - pgstat_restore_stats(checkPoint.redo);
> + pgstat_restore_stats(checkPoint.redo)
>
> remove the TODO comment?

Pretty sure I've removed that more than one time already, and that
this is a rebase accident. Thanks for noticing.

> 5 ===
>
> + * process) if the stats file has a redo LSN that matches with the .
>
> unfinished sentence?

This is missing a reference to the control file.

> 6 ===
>
> - * Should only be called by the startup process or in single user mode.
> + * This is called by the checkpointer or in single-user mode.
> */
> void
> -pgstat_discard_stats(void)
> +pgstat_flush_stats(XLogRecPtr redo)
> {
>
> Would that make sense to add an Assert in pgstat_flush_stats()? (checking what
> the above comment states).

There is one in pgstat_write_statsfile(), not sure there is a point in
duplicating the assertion in both.

Attaching a new v4 series, with all these comments addressed.
--
Michael

Attachment Content-Type Size
v4-0001-Revert-Test-that-vacuum-removes-tuples-older-than.patch text/x-diff 12.3 KB
v4-0002-Add-more-debugging-information-when-reading-stats.patch text/x-diff 4.7 KB
v4-0003-Add-redo-LSN-to-pgstats-file.patch text/x-diff 4.7 KB
v4-0004-Flush-pgstats-file-during-checkpoints.patch text/x-diff 10.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-07-23 04:03:57 Re: Virtual generated columns
Previous Message Amit Kapila 2024-07-23 03:35:05 Re: pg_upgrade and logical replication