Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-06 16:03:03
Message-ID: CAH2-WznapW1LsQ9Sr27m+veKGG+_mf3Ej2PjGex--tnK6jtT7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 5, 2024 at 12:23 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > As I think we chatted about before, I eventually would like the option to
> > remove index entries for a tuple during on-access pruning, for OLTP
> > workloads. I.e. before removing the tuple, construct the corresponding index
> > tuple, use it to look up index entries pointing to the tuple. If all the index
> > entries were found (they might not be, if they already were marked dead during
> > a lookup, or if an expression wasn't actually immutable), we can prune without
> > the full index scan. Obviously this would only be suitable for some
> > workloads, but it could be quite beneficial when you have huge indexes. The
> > reason I mention this is that then we'd have another source of marking items
> > unused during pruning.
>
> I will be astonished if you can make this work well enough to avoid
> huge regressions in plausible cases. There are plenty of cases where
> we do a very thorough job opportunistically removing index tuples.

Right. In particular, bottom-up index deletion works well because it
adds a kind of natural backpressure to one important special case (the
case of non-HOT updates that don't "logically change" any indexed
column). It isn't really all that "opportunistic" in my understanding
of the term -- the overall effect is to *systematically* control bloat
in a way that is actually quite reliable. Like you, I have my doubts
that it would be valuable to be more proactive about deleting dead
index tuples that are just random dead tuples. There may be a great
many dead index tuples diffusely spread across an index -- these can
be quite harmless, and not worth proactively cleaning up (even at a
fairly low cost). What we mostly need to worry about is *concentrated*
build-up of dead index tuples in particular leaf pages.

A natural question to ask is: what cases remain, where we could stand
to add more backpressure? What other "special case" do we not yet
address? I think that retail index tuple deletion could work well as
part of a limited form of "transaction rollback" that cleans up after
a just-aborted transaction, within the backend that executed the
transaction itself. I suspect that this case now has outsized
importance, precisely because it's the one remaining case where the
system accumulates index bloat without any sort of natural
backpressure. Making the transaction/backend that creates bloat
directly responsible for proactively cleaning it up tends to have a
stabilizing effect over time. The system is made to live within its
means.

We could even fully reverse heap page line pointer bloat under this
"transaction rollback" scheme -- I bet that aborted xacts are a
disproportionate source of line pointer bloat. Barring a hard crash,
or a very large transaction, we could "undo" the physical changes to
relations before permitting the backend to retry the transaction from
scratch. This would just work as an optimization.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-06 16:18:52 Re: verify predefined LWLocks have entries in wait_event_names.txt
Previous Message vignesh C 2024-01-06 15:51:35 Re: Bytea PL/Perl transform