Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-02-02 20:33:14
Message-ID: CAH2-Wzm8kkRpW-x1hwOkwOZruV2dbON6X-a_ky5W8Em2CvhFBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 2, 2021 at 6:28 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> I really like this idea!

Cool!

> It resembles the approach used in bottom-up index deletion, block-based
> accounting provides a better estimate for the usefulness of the operation.

It does resemble bottom-up index deletion, in one important general
sense: it is somewhat qualitative (though *also* somewhat quantitive).
This is new for vacuumlazy.c. But the idea now is to deemphasize
bottom-up index deletion heavy workloads in the first version of this
patch -- just to cut scope.

The design I described yesterday centers around "~99.9% append-only
table" workloads, where anti-wraparound vacuums that scan indexes are
a big source of unnecessary work (in practice it is always
anti-wraparound vacuums, simply because there will never be enough
garbage to trigger a regular autovacuum run). But it now occurs to me
that there is another very important case that it will also help,
without making the triggering condition for index vacuuming any more
complicated: it will help cases where HOT updates are expected
(because all updates don't modify indexed columns).

It's practically impossible for HOT updates to occur 100% of the time,
even with workloads whose updates never modify indexed columns. You
can clearly see this by looking at the stats from pg_stat_user_tables
with a standard pgbench workload. It does get better with lower heap
fill factor, but I think that HOT is never 100% effective (i.e. 100%
of updates are HOT updates) in the real world -- unless maybe you set
heap fillfactor as low as 50, which is very rare. HOT might well be
95% effective, or 99% effective, but it's never truly 100% effective.
And so this is another important workload where the difference between
"practically zero dead tuples" and "precisely zero dead tuples"
*really* matters when deciding if a VACUUM operation needs to go
ahead.

Once again, a small difference, but also a big difference. Forgive me
for repeating myself do much, but: paying attention to cost/benefit
asymmetries like this one sometimes allow us to recognize an
optimization that is an "excellent deal". We saw this with bottom-up
index deletion. Seems good to keep an eye out for that.

> I suppose that 1% threshold should be configurable as a cluster-wide GUC
> and also as a table storage parameter?

Possibly. I'm concerned about making any user-visible interface (say a
GUC) compatible with an improved version that is smarter about
bottom-up index deletion (in particular, one that can vacuum only a
subset of the indexes on a table, which now seems too ambitious for
Postgres 14).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2021-02-02 20:33:35 Re: Support for NSS as a libpq TLS backend
Previous Message Michail Nikolaev 2021-02-02 20:31:00 Re: Thoughts on "killed tuples" index hint bits support on standby