Re: Count and log pages set all-frozen by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Count and log pages set all-frozen by vacuum
Date: 2024-12-17 00:40:13
Message-ID: CAAKRu_bWMa-wakEmFfB3h-CEJSuh2jAqTgYBRpV99cpovY4MXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2024 at 9:39 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 12/11/24 20:18, Masahiko Sawada wrote:
> >
> > ...
> >
> >> Here's an example to exercise the new log message:
> >>
> >> create table foo (a int, b int) with (autovacuum_enabled = false);
> >> insert into foo select generate_series(1,1000), 1;
> >> delete from foo where a > 500;
> >> vacuum (verbose) foo;
> >> -- visibility map: 5 pages newly set all-visible, of which 2 set
> >> all-frozen. 0 all-visible pages newly set all-frozen.
> >> insert into foo select generate_series(1,500), 1;
> >> vacuum (verbose, freeze) foo;
> >> --visibility map: 3 pages newly set all-visible, of which 3 set
> >> all-frozen. 2 all-visible pages newly set all-frozen.

> I agree the v4 patch is fine, although I find the wording with multiple
> "newly" a bit verbose / confusing. Maybe like this would be better:
>
> %u pages set all-visible, %u set all-frozen (%u were all-visible)
>
> I don't want to drag this thread into an infinite discussion about the
> best possible wording, so if others think the v4 wording is fine, I
> won't object to it.

I changed it to use your suggested wording.

> For me the bigger questions is whether these new counters added to te
> vacuum log message are actually useful in practice. I understand it may
> be useful while working on a patch related to eager freezing (although I
> think Melanie changed the approach of that patch series, and it doesn't
> actually require this patch anymore).
>
> But I'm a bit skeptical about it being useful for regular users or DBAs.
> I certainly don't remember me wanting to know these values per-vacuum.
> Of course, maybe that's bias - knowing it's not available and thus not
> asking for it. But I think it's also very hard to make conclusions about
> the "freeze debt" from these per-vacuum values - we don't know how the
> values combine. It might be disjunct set of pages (and then we should
> just sum them), or maybe it's the same set of pages over and over again
> (and then the debt doesn't really change).

I agree that it won't fully tell you the freeze debt over time, but,
for example, you can look at the number of scanned pages and the total
number of pages and if the number of pages being set all-visible stays
the same each time but the total number of pages in the relation grows
(so % scanned shrinks) that likely means you are setting new pages
all-visible.

> It doesn't mean it's useless - e.g. we might compare the sum to a delta
> of values from visibility_map_summary() and make some deductions about
> how often are pages set all-visible repeatedly. And maybe vacuum should
> log those before/after visibility_count values too, to make this easier.
> Not sure how costly would that be ...

Right, it is useful to know how many are all-visible at the start and
finish of vacuum. But that is pretty different from the other vacuum
log output. The rest of it tells you what the vacuum did, while this
would tell you how the table has changed state during the vacuum.

> To me these visibility_count seem more important when quantifying the
> freeze debt for a given table. But I think that also needs to consider
> how old those all-visible pages are - the older the more it contributes
> to the debt, I think. And that won't be in the vacuum log, of course.

Yes, to keep track of how old the all-visible pages are, you need
something like a histogram or other data structure to represent the
ranges of ages of all-visible pages -- then update them when a page is
modified. I definitely think this would be useful. But I ran into the
same space issues I mention below. You need a killer use case to
justify introducing something that takes up hundreds of bytes per
table (and introduces new APIs).

> Another thing is that analyzing vacuum log messages is ... not very
> easy. Having to parse multi-line log messages, with a mix of text and
> fields (not even mentioning translations) is not fun. Of course, none of
> that is a fault of this patch, and I don't expect this patch to fix
> that. But it's hard to get excited about new fields added to this log
> message, when it'd be most useful aggregated for vacuums over some time
> interval. I really wish we had some way to collect and access these
> runtime stats in a structured way.

Yes, I also wish we could have a new type of stat that is over time
but decaying (a ring buffer of recent happenings). I did several
versions of things like this in draft patches for the eager freeze
heuristic. The problem is that all of these take up space -- even if
you just keep stats for the last 3 vacuums, that's dozens-hundreds of
bytes per table for all tables. I don't see how, even with decaying
them, we could afford to keep them in memory. So, we would probably
want to have some sort of new on-disk stat type that isn't a table.

- Melanie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2024-12-17 00:46:18 Re: pure parsers and reentrant scanners
Previous Message Michael Paquier 2024-12-17 00:25:38 Re: pg_combinebackup PITR comparison test fix