Re: New IndexAM API controlling index vacuum strategies

From: Victor Yegorov <vyegorov(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-02-02 14:27:56
Message-ID: CAGnEboiYish96utN02u06NYr16AEEOzRBKfSeAkqsvS3EkAEqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

вт, 2 февр. 2021 г. в 05:27, Peter Geoghegan <pg(at)bowt(dot)ie>:

> And now here is the second thing I thought of, which is much better:
>
> Sometimes 1% of the dead tuples in a heap relation will be spread
> across 90%+ of the pages. With other workloads 1% of dead tuples might
> be highly concentrated, and appear in no more than 1% of all heap
> pages. Obviously the distinction between these two cases/workloads
> matters a lot. And so the triggering criteria must be quantitative
> *and* qualitative. It should not be based on counting dead tuples,
> since that alone won't differentiate these two extreme cases - both of
> which are probably quite common (in the real world extremes are
> actually the normal and common case IME).
>
> I like the idea of basing it on counting *heap blocks*, not dead
> tuples. We can count heap blocks that have *at least* one dead tuple
> (of course it doesn't matter how they're dead, whether it was this
> VACUUM operation or some earlier opportunistic pruning). Note in
> particular that it should not matter if it's a heap block that has
> only one LP_DEAD line pointer or a heap page that is near the
> MaxHeapTuplesPerPage limit for the page -- we count either type of
> page towards the heap-page based limit used to decide if index
> vacuuming goes ahead for all indexes during VACUUM.
>

I really like this idea!

It resembles the approach used in bottom-up index deletion, block-based
accounting provides a better estimate for the usefulness of the operation.

I suppose that 1% threshold should be configurable as a cluster-wide GUC
and also as a table storage parameter?

--
Victor Yegorov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-02-02 14:54:30 Re: Add primary keys to system catalogs
Previous Message 'Alvaro Herrera' 2021-02-02 14:20:45 Re: libpq debug log