Quick Links

Re: New IndexAM API controlling index vacuum strategies

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject:	Re: New IndexAM API controlling index vacuum strategies
Date:	2021-04-14 21:55:36
Message-ID:	CAH2-Wz=YP_y+P9K9V13fL-dwA9xwkP2EmxKGNkP64bftMAc9_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 14, 2021 at 12:33 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I'm getting a bit bothered by the speed at which you're pushing fairly
> substantial behavioural for vacuum. In this case without even a warning
> that you're about to do so.

To a large degree the failsafe is something that is written in the
hope that it will never be needed. This is unlike most other things,
and has its own unique risks.

I think that the proper thing to do is to accept a certain amount of
risk in this area. The previous status quo was *appalling*, and so it
seems very unlikely that the failsafe hasn't mostly eliminated a lot
of risk for users. That factor is not everything, but it should count
for a lot. The only way that we're going to have total confidence in
anything like this is through the experience of it mostly working over
several releases.

> I don't think it's that blindingly obvious that skipping truncation is
> the right thing to do that it doesn't need review. Consider e.g. the
> case that you're close to wraparound because you ran out of space for
> the amount of WAL VACUUM produces, previously leading to autovacuums
> being aborted / the server restarted. The user might then stop regular
> activity and try to VACUUM. Skipping the truncation might now make it
> harder to actually vacuum all the tables without running out of space.

Note that the criteria for whether or not "hastup=false" for a page is
slightly different in lazy_scan_prune() -- I added a comment that
points this out directly (the fact that it works that way is not new,
and might have originally been a happy mistake). Unlike
count_nondeletable_pages(), which is used by heap truncation,
lazy_scan_prune() is concerned about whether or not it's *likely to be
possible* to truncate away the page by the time lazy_truncate_heap()
is reached (if it is reached at all). And so it's optimistic about
LP_DEAD items that it observes being removed by
lazy_vacuum_heap_page() before we get to lazy_truncate_heap(). It's
inherently race-prone anyway, so it might as well assume that LP_DEAD
items will eventually become LP_UNUSED items later on.

It follows that the chances of lazy_truncate_heap() failing to
truncate anything when the failsafe has already triggered are
exceptionally high -- all the LP_DEAD items are still there, and
cannot be safely removed during truncation (for the usual reasons). I
just went one step further than that in this recent commit. I didn't
point these details out before now because (to me) this is beside the
point. Which is that the failsafe is just that -- a failsafe. Anything
that adds unnecessary unpredictable delay in reaching the point of
advancing relfrozenxid should be avoided. (Besides, the design of
should_attempt_truncation() and lazy_truncate_heap() is very far from
guaranteeing that truncation will take place at the best of times.)

FWIW, my intention is to try to get as much feedback about the
failsafe as I possibly can -- it's hard to reason about exceptional
events. I'm also happy to further discuss the specifics with you now.

--
Peter Geoghegan

In response to

Re: New IndexAM API controlling index vacuum strategies at 2021-04-14 19:33:10 from Andres Freund

Responses

Re: New IndexAM API controlling index vacuum strategies at 2021-04-15 00:08:10 from Robert Haas
Re: New IndexAM API controlling index vacuum strategies at 2021-04-15 01:53:04 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Dilger	2021-04-14 22:18:48	Re: Converting contrib SQL functions to new style
Previous Message	Vik Fearing	2021-04-14 21:47:53	Re: Converting contrib SQL functions to new style