PoC: history of recent vacuum/checkpoint runs (using new hooks)

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Date: 2024-12-21 20:27:54
Message-ID: b68ab452-c41f-4d04-893f-eaab84f1855b@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Our runtime stats system is great, but it only keeps a snapshot of
cumulative stats / snapshot. And while that works for user workloads, it
may not be quite sufficient when analyzing maintenance operations like
vacuum/checkpoint, etc.

For those operations it's useful to have information about individual
runs, not just the cumulative counters (or even deltas between regular
snapshots). There's also the issue that we only keep a very limited
subset of available information - just look at the info included in
VACUUM VERBOSE or with log_checkpoints=on, and how little of that is
available in pg_stats_. For vacuum we have the vacuum/analyze counts,
and timestamp of the last operation, but that's it. VACUUM VERBOSE
provides way more information, but we can only guess based on the
regular pgstat counters.

Yes, we can get this info written to server log using log_checkpoints
and log_autovacuum_min_duration (AFAIK there's no way to ensure logging
for manual VACUUM). But processing this information is not particularly
convenient, as it requires parsing the log, the message format is
suitable more for humans, etc. And it'd be very convenient to be able to
query this information, just like the other pgstat catalogs.

I wonder if we might/should do two things, to improve this:

1) Introduce hooks that allow doing some custom stuff with info about
those actions, after logging it. The attached 0001 and 0002 patches do
this for vacuum and checkpoint.

2) Allow keeping information about events. The 0003 patch does that in
an extension, leveraging the new hooks, but it'd certainly possible to
do in core too.

I realize our current pgstat collector is keeping per-object stats, not
per-event. We might add this to per-object stats (e.g. each table would
have stats about vacuum runs), but that wouldn't work for checkpoints.
There's also the question of memory consumption - I'm sure we don't want
to keep infinite history of vacuum runs, for example.

So the extension simply maintains a fixed-size circular queue, i.e. when
it gets full it starts evicting oldest entries. 1MB is enough for
storing ~4k VACUUM runs - I'm sure it can be made more compact.

I don't think there's a better way to do this. I've considered if this
might be done using emit_log_hook, but (a) that only helps when we end
up logging the event (and I'd like to do this always), and (b) it'd
require parsing the server log. So it's not much better than just doing
that, I think ...

Opinions?

--
Tomas Vondra

Attachment Content-Type Size
v1-0001-vacuum_log_hook.patch text/x-patch 7.7 KB
v1-0002-checkpoint_log_hook.patch text/x-patch 5.0 KB
v1-0003-stats_history-extension.patch text/x-patch 39.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2024-12-21 21:45:25 Re: SQL Property Graph Queries (SQL/PGQ)
Previous Message Michail Nikolaev 2024-12-21 18:00:01 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements