Re: Defining (and possibly skipping) useless VACUUM operations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chris Travers <chris(dot)travers(at)gmail(dot)com>
Subject: Re: Defining (and possibly skipping) useless VACUUM operations
Date: 2021-12-14 14:05:38
Message-ID: CA+Tgmobz7pvZMmn17cjYni9Bzy_NFG__-0DOOJ7eHNFPBJAFoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2021 at 8:47 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I am currently working on decoupling advancing relfrozenxid from tuple
> freezing [1]. That is, I'm teaching VACUUM to keep track of
> information that it uses to generate an "optimal value" for the
> table's final relfrozenxid: the most recent XID value that might still
> be in the table. This patch is based on the observation that we don't
> actually have to use the FreezeLimit cutoff for our new
> pg_class.relfrozenxid. We need only obey the basic relfrozenxid
> invariant, which is that the final value must be <= any extant XID in
> the table. Using FreezeLimit is needlessly conservative.

Right.

> It now occurs to me to push this patch in another direction, on top of
> all that: the OldestXmin behavior hints at a precise, robust way of
> defining "useless vacuuming". We can condition skipping a VACUUM (i.e.
> whether a VACUUM is considered "definitely won't be useful if allowed
> to execute") on whether or not our preexisting pg_class.relfrozenxid
> precisely equals our newly-acquired OldestXmin for an about-to-begin
> VACUUM operation. (We'd also want to add an "unchangeable
> pg_class.relminmxid" test, I think.)

I think this is a reasonable line of thinking, but I think it's a
little imprecise. In general, we could be vacuuming a relation to
advance relfrozenxid, but we could also be vacuuming a relation to
advance relminmxid, or we could be vacuuming a relation to fight
bloat, or set pages all-visible. It is possible that there's no hope
of advancing relfrozenxid but that we can still accomplish one of the
other goals. In that case, the vacuuming is not useless. I think the
place to put logic around this would be in the triggering logic for
autovacuum. If we're going to force a relation to be vacuumed because
of (M)XID wraparound danger, we could first check whether there seems
to be any hope of advancing relfrozenxid(minmxid). If not, we discount
that as a trigger for vacuum, but may still decide to vacuum if some
other trigger warrants it. In most cases, if there's no hope of
advancing relfrozenxid, there won't be any bloat to remove either, but
aborted transactions are a counterexample. And the XID and MXID
horizons can advance at completely different rates.

One reason I haven't pursued this kind of optimization is that it
doesn't really feel like it's fixing the whole problem. It would be a
little bit sad if we did a perfect job preventing useless vacuuming
but still allowed almost-useless vacuuming. Suppose we have a 1TB
relation and we trigger autovacuum. It cleans up a few things but
relfrozenxid is still old. On the next pass, we see that the
system-wide xmin has not advanced, so we don't trigger autovacuum
again. Then on the pass after that we see that the system-wide xmin
has advanced by 1. Shall we trigger an autovacuum of the whole
relation now, to be able to do relfrozenxid++? Seems dubious.

Part of the problem here, for both vacuuming-for-bloat and
vacuuming-for-relfrozenxid-advancement, we would really like to know
the distribution of old XIDs in the table. If we knew that a lot of
the inserts, updates, and deletes that are causing us to vacuum for
bloat containment were in a certain relatively narrow range, then we'd
probably want to not autovacuum for either purpose until the
system-wide xmin has crossed through at least a good chunk of that
range. And it fully crossed over that range then an immediate vacuum
looks extremely appealing: we'll both remove a bunch of dead tuples
and reclaim the associated line pointers, and at the same time we'll
be able to advance relfrozenxid. Nice! But we have no such
information.

So I'm not certain of the way forward here. Just because we can't
prevent almost-useless vacuuming is not a sufficient reason to
continue allowing entirely-useless vacuuming that we can prevent. And
it seems like we need a bunch of new bookkeeping to do any better than
that, which seems like a lot of work. So maybe it's the most practical
path forward for the time being, but it feels like more of a
special-purpose kludge than a truly high-quality solution.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-12-14 14:13:18 Re: more descriptive message for process termination due to max_slot_wal_keep_size
Previous Message Mikael Kjellström 2021-12-14 14:05:12 Re: conchuela has some SSL issues