Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Nikita Malakhov <hukutoc(at)gmail(dot)com>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-12-17 02:44:11
Message-ID: CAH2-WznEYuC48DauOgZjAk6mJgKOBRprzrMtFn9X4x9OT8pQ_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 15, 2022 at 11:59 PM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
> I've found this discussion very interesting, in view of vacuuming
> TOAST tables is always a problem because these tables tend to
> bloat very quickly with dead data - just to remind, all TOAST-able
> columns of the relation use the same TOAST table which is one
> for the relation, and TOASTed data are not updated - there are
> only insert and delete operations.

I don't think that it would be any different to any other table that
happened to have lots of inserts and deletes, such as the table
described here:

https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Mixed_inserts_and_deletes

In the real world, a table like this would probably consist of some
completely static data, combined with other data that is constantly
deleted and re-inserted -- probably only a small fraction of the table
at any one time. I would expect such a table to work quite well,
because the static pages would all become frozen (at least after a
while), leaving behind only the tuples that are deleted quickly, most
of the time. VACUUM would have a decent chance of noticing that it
will be cheap to advance relfrozenxid in earlier VACUUM operations, as
bloat is cleaned up -- even a VACUUM that happens long before the
point that autovacuum.c will launch an antiwraparound autovacuum has a
decent chance of it. That's not a new idea, really; the
pgbench_branches example from the Wiki page looks like that already,
and even works on Postgres 15.

Here is the part that's new: the pressure to advance relfrozenxid
grows gradually, as table age grows. If table age is still very young,
then we'll only do it if the number of "extra" scanned pages is < 5%
of rel_pages -- only when the added cost is very low (again, like the
pgbench_branches example, mostly). Once table age gets about halfway
towards the point that antiwraparound autovacuuming is required,
VACUUM then starts caring less about costs. It gradually worries less
about the costs, and more about the need to advance it. Ideally it
will happen before antiwraparound autovacuum is actually required.

I'm not sure how much this would help with bloat. I suspect that it
could make a big difference with the right workload. If you always
need frequent autovacuums, just to deal with bloat, then there is
never a good time to run an aggressive antiwraparound autovacuum. An
aggressive AV will probably end up taking much longer than the typical
autovacuum that deals with bloat. While the aggressive AV will remove
as much bloat as any other AV, in theory, that might not help much. If
the aggressive AV takes as long as (say) 5 regular autovacuums would
have taken, and if you really needed those 5 separate autovacuums to
run, just to deal with the bloat, then that's a real problem. The
aggressive AV effectively causes bloat with such a workload.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-12-17 03:08:21 Re: Refactor SCRAM code to dynamically handle hash type and key length
Previous Message Daniel Watzinger 2022-12-16 23:55:24 pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32