Quick Links

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Date:	2024-12-28 01:25:16
Message-ID:	00d42567-22bd-4ed2-b190-188e9c151e28@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/27/24 05:00, Michael Paquier wrote:
> On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote:
>> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
>> make a fundamental difference ...
>>
>> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
>> limit, or possibly removing it entirely (and accepting anything the
>> system can handle).
>
> + DefineCustomIntVariable("stats_history.size",
> + "Sets the amount of memory available for past events.",
>
> How about some time-based retention? Data size can be hard to think
> about for the end user, while there would be a set of users that would
> want to retain data for the past week, month, etc? If both size and
> time upper-bound are define, then entries that match one condition or
> the other are removed.
>

Right. In my response [1] I suggested replacing the simple memory limit
with a time-based limit, but I haven't done anything about that yet.

And the more I think about it the more I'm convinced we don't need to
keep the data about past runs in memory, a file should be enough (except
maybe for a small buffer). That would mean we don't need to worry about
dynamic shared memory, etc. I initially rejected this because it seemed
like a regression to how pgstat worked initially (sharing data through
files), but I don't think that's actually true - this data is different
(almost append-only), etc.

The one case when we may need co read the data is in response to DROP of
a table, when we need to discard entries for that object. Or we could
handle that by recording OIDs of dropped objects ... not sure how
complex this would need to be.

We'd still want to enforce some limits, of course - a time-based limit
of the minimum amount of time to cover, and maximum amount of disk space
to use (more as a safety).

FWIW there's one "issue" with enforcing the time-based limit - we can
only do that for the "end time", because that's when we see the entry.
If you configure the limit to keep "1 day" history, and then a vacuum
completes after running for 2 days, we want to keep it, so that anyone
can actually see it.

[1]
https://www.postgresql.org/message-id/8df7cee1-31aa-4db3-bbb7-83157ca139da%40vondra.me

> + checkpoint_log_hook(
> + CheckpointStats.ckpt_start_t, /* start_time */
> + CheckpointStats.ckpt_end_t, /* end_time */
> + (flags & CHECKPOINT_IS_SHUTDOWN), /* is_shutdown */
> + (flags & CHECKPOINT_END_OF_RECOVERY), /* is_end_of_recovery */
> + (flags & CHECKPOINT_IMMEDIATE), /* is_immediate */
> + (flags & CHECKPOINT_FORCE), /* is_force */
> + (flags & CHECKPOINT_WAIT), /* is_wait */
> + (flags & CHECKPOINT_CAUSE_XLOG), /* is_wal */
> + (flags & CHECKPOINT_CAUSE_TIME), /* is_time */
> + (flags & CHECKPOINT_FLUSH_ALL), /* is_flush_all */
> + CheckpointStats.ckpt_bufs_written, /* buffers_written */
> + CheckpointStats.ckpt_slru_written, /* slru_written */
> + CheckpointStats.ckpt_segs_added, /* segs_added */
> + CheckpointStats.ckpt_segs_removed, /* segs_removed */
> + CheckpointStats.ckpt_segs_recycled, /* segs_recycled */
>
> That's a lot of arguments. CheckpointStatsData and the various
> CHECKPOINT_* flags are exposed, why not just send these values to the
> hook?
>
> For v1-0001 as well, I'd suggest some grouping with existing
> structures, or expose these structures so as they can be reused for
> out-of-core code via the proposed hook. More arguments lead to more
> mistakes that could be easily avoided.

Yes, I admit the number of parameters seemed a bit annoying to me too,
and maybe we could reduce that somewhat. Certainly for checkpoints,
where we already have a reasonable CheckpointStats struct and flags,
wrapping most of the fields.

With vacuum it's a bit more complicated, for two reasons: (a) the
counters are simply in LVRelState, mixed with all kinds of other info,
and it seems "not great" to pass it to a "log" hook, and (b) there are
some calculated values, so I guess the hook would need to do that
calculation on it's own, and it'd be easy to diverge over time.

For (a) we could introduce some "stats" struct to keep these counters
for vacuum (the nearby parallel vacuum patch actually does something
like that, I think). Not sure if (b) is actually a problem in practice,
but I guess we could add those fields to the new "stats" struct too.

regards

--
Tomas Vondra

In response to

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) at 2024-12-27 04:00:37 from Michael Paquier

Responses

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) at 2024-12-29 15:39:13 from Robert Treat
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) at 2024-12-31 01:06:45 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2024-12-28 01:48:22	Re: Add the ability to limit the amount of memory that can be allocated to backends.
Previous Message	David Christensen	2024-12-28 00:15:59	Re: [PATCHES] Post-special page storage TDE support