From: | Robert Treat <rob(at)xzilla(dot)net> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) |
Date: | 2024-12-29 15:39:13 |
Message-ID: | CAJSLCQ1dAmjuj=kV3yXvy3QDWpYpy+TDGgr3LGpLSh96HwYkFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 27, 2024 at 8:25 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 12/27/24 05:00, Michael Paquier wrote:
> > On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote:
> >> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> >> make a fundamental difference ...
> >>
> >> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> >> limit, or possibly removing it entirely (and accepting anything the
> >> system can handle).
> >
> > + DefineCustomIntVariable("stats_history.size",
> > + "Sets the amount of memory available for past events.",
> >
> > How about some time-based retention? Data size can be hard to think
> > about for the end user, while there would be a set of users that would
> > want to retain data for the past week, month, etc? If both size and
> > time upper-bound are define, then entries that match one condition or
> > the other are removed.
> >
>
> Right. In my response [1] I suggested replacing the simple memory limit
> with a time-based limit, but I haven't done anything about that yet.
>
> And the more I think about it the more I'm convinced we don't need to
> keep the data about past runs in memory, a file should be enough (except
> maybe for a small buffer). That would mean we don't need to worry about
> dynamic shared memory, etc. I initially rejected this because it seemed
> like a regression to how pgstat worked initially (sharing data through
> files), but I don't think that's actually true - this data is different
> (almost append-only), etc.
>
> The one case when we may need co read the data is in response to DROP of
> a table, when we need to discard entries for that object. Or we could
> handle that by recording OIDs of dropped objects ... not sure how
> complex this would need to be.
>
> We'd still want to enforce some limits, of course - a time-based limit
> of the minimum amount of time to cover, and maximum amount of disk space
> to use (more as a safety).
>
> FWIW there's one "issue" with enforcing the time-based limit - we can
> only do that for the "end time", because that's when we see the entry.
> If you configure the limit to keep "1 day" history, and then a vacuum
> completes after running for 2 days, we want to keep it, so that anyone
> can actually see it.
>
I can't say I recall all the reasoning involved in making
pg_stat_statements just be based on a fixed number of entries, but the
ability to come up with corner cases was certainly a factor. For
example, imagine the scenario where you set a max at 30 days, but you
have some tables only being vacuumed every few months. Ideally you
probably want the last entry no matter what, and honestly probably the
last 2 (in case you are troubleshooting something, having the last run
and something to compare against is ideal). In any case, it can get
complicated pretty quickly.
> [1]
> https://www.postgresql.org/message-id/8df7cee1-31aa-4db3-bbb7-83157ca139da%40vondra.me
>
> > + checkpoint_log_hook(
> > + CheckpointStats.ckpt_start_t, /* start_time */
> > + CheckpointStats.ckpt_end_t, /* end_time */
> > + (flags & CHECKPOINT_IS_SHUTDOWN), /* is_shutdown */
> > + (flags & CHECKPOINT_END_OF_RECOVERY), /* is_end_of_recovery */
> > + (flags & CHECKPOINT_IMMEDIATE), /* is_immediate */
> > + (flags & CHECKPOINT_FORCE), /* is_force */
> > + (flags & CHECKPOINT_WAIT), /* is_wait */
> > + (flags & CHECKPOINT_CAUSE_XLOG), /* is_wal */
> > + (flags & CHECKPOINT_CAUSE_TIME), /* is_time */
> > + (flags & CHECKPOINT_FLUSH_ALL), /* is_flush_all */
> > + CheckpointStats.ckpt_bufs_written, /* buffers_written */
> > + CheckpointStats.ckpt_slru_written, /* slru_written */
> > + CheckpointStats.ckpt_segs_added, /* segs_added */
> > + CheckpointStats.ckpt_segs_removed, /* segs_removed */
> > + CheckpointStats.ckpt_segs_recycled, /* segs_recycled */
> >
> > That's a lot of arguments. CheckpointStatsData and the various
> > CHECKPOINT_* flags are exposed, why not just send these values to the
> > hook?
> >
> > For v1-0001 as well, I'd suggest some grouping with existing
> > structures, or expose these structures so as they can be reused for
> > out-of-core code via the proposed hook. More arguments lead to more
> > mistakes that could be easily avoided.
>
> Yes, I admit the number of parameters seemed a bit annoying to me too,
> and maybe we could reduce that somewhat. Certainly for checkpoints,
> where we already have a reasonable CheckpointStats struct and flags,
> wrapping most of the fields.
>
> With vacuum it's a bit more complicated, for two reasons: (a) the
> counters are simply in LVRelState, mixed with all kinds of other info,
> and it seems "not great" to pass it to a "log" hook, and (b) there are
> some calculated values, so I guess the hook would need to do that
> calculation on it's own, and it'd be easy to diverge over time.
>
> For (a) we could introduce some "stats" struct to keep these counters
> for vacuum (the nearby parallel vacuum patch actually does something
> like that, I think). Not sure if (b) is actually a problem in practice,
> but I guess we could add those fields to the new "stats" struct too.
>
At the risk of increasing scope, since you already are working on
checkpoints along with vacuums, I'm curious if there was a reason not
to do analyze stats retention as well? It seems pretty correlated in
the same area/problems as vacuum history.
Robert Treat
https://xzilla.net
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-12-29 18:25:05 | Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) |
Previous Message | Peter Eisentraut | 2024-12-29 14:13:09 | meson: Fix missing name arguments of cc.compiles() calls |