Re: New IndexAM API controlling index vacuum strategies

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-04-06 14:05:09
Message-ID: CAEze2WgSfnr7aNCn13rG2aktMae0f8EbXNN+m-3+QVYX+-EQ6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 6 Apr 2021 at 05:13, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Apr 5, 2021 at 2:44 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > This assertion is false, and should be a guarding if-statement. HOT
> > redirect pointers are not updated if the tuple they're pointing to is
> > vacuumed (i.e. when it was never committed) so this nextoffnum might
> > in a correctly working system point past maxoff.
>
> I will need to go through this in detail soon.
>
> > > Line pointer truncation doesn't happen during pruning, as it did in
> > > Matthias' original patch. In this revised version, line pointer
> > > truncation occurs during the second phase of VACUUM. There are several
> > > reasons to prefer this approach. It seems both safer and more useful
> > > that way (compared to the approach of doing line pointer truncation
> > > during pruning). It also makes intuitive sense to do it this way, at
> > > least to me -- the second pass over the heap is supposed to be for
> > > "freeing" LP_DEAD line pointers.
> >
> > Good catch for running a line pointer truncating pass at the second
> > pass over the heap in VACUUM, but I believe that it is also very
> > useful for pruning. Line pointer bloat due to excessive HOT chains
> > cannot be undone until the 2nd run of VACUUM happens with this patch,
> > which is a missed chance for all non-vacuum pruning.
>
> Maybe - I have my doubts about it having much value outside of the
> more extreme cases. But let's assume that I'm wrong about that, for
> the sake of argument.
>
> The current plan is to no longer require a super-exclusive lock inside
> lazy_vacuum_heap_page(), which means that we can no longer safely call
> PageRepairFragmentation() at that point. This will mean that
> PageRepairFragmentation() is 100% owned by pruning. And so the
> question of whether or not line pointer truncation should also happen
> in PageRepairFragmentation() to cover pruning is (or will be) a
> totally separate question to the question of how
> lazy_vacuum_heap_page() does it. Nothing stops you from independently
> pursuing that as a project for Postgres 15.

Ah, then I misunderstood your intentions when you mentioned including
a modified version of my patch. In which case, I agree that improving
HOT pruning is indeed out of scope.

> > What difference is there between opportunistically pruned HOT line
> > pointers, and VACUUMed line pointers?
>
> The fact that they are physically identical to each other isn't
> everything. The "life cycle" of an affected page is crucially
> important.
>
> I find that there is a lot of value in thinking about how things look
> at the page level moment to moment, and even over hours and days.
> Usually with a sample workload and table in mind. I already mentioned
> the new_order table from TPC-C, which is characterized by continual
> churn from more-or-less even amounts of range deletes and bulk inserts
> over time. That seems to be the kind of workload where you see big
> problems with line pointer bloat. Because there is constant churn of
> unrelated logical rows (it's not a bunch of UPDATEs).
>
> It's possible for very small effects to aggregate into large and
> significant effects -- I know this from my experience with indexing.
> Plus the FSM is probably not very smart about fragmentation, which
> makes it even more complicated. And so it's easy to be wrong if you
> predict that some seemingly insignificant extra intervention couldn't
> possibly help. For that reason, I don't want to predict that you're
> wrong now. It's just a question of time, and of priorities.
>
> > Truncating during pruning has
> > the benefit of keeping the LP array short where possible, and seeing
> > that truncating the LP array allows for more applied
> > PD_HAS_FREE_LINES-optimization, I fail to see why you wouldn't want to
> > truncate the LP array whenever clearing up space.
>
> Truncating the line pointer array is not an intrinsic good. I hesitate
> to do it during pruning in the absence of clear evidence that it's
> independently useful. Pruning is a very performance sensitive
> operation. Much more so than VACUUM's second heap pass.
>
> > Other than those questions, some comments on the other patches:
> >
> > 0002:
> > > + appendStringInfo(&buf, _("There were %lld dead item identifiers.\n"),
> > > + (long long) vacrel->lpdead_item_pages);
> >
> > I presume this should use vacrel->lpdead_items?.
>
> It should have been, but as it happens I have decided to not do this
> at all in 0002-*. Better to not report on LP_UNUSED *or* LP_DEAD items
> at this point of VACUUM VERBOSE output.
>
> > 0003:
> > > + * ... Aborted transactions
> > > + * have tuples that we can treat as DEAD without caring about where there
> > > + * tuple header XIDs ...
> >
> > This should be '... where their tuple header XIDs...'
>
> Will fix.
>
> > > +retry:
> > > +
> > > ...
> > > + res = HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf);
> > > +
> > > + if (unlikely(res == HEAPTUPLE_DEAD))
> > > + goto retry;
> >
> > In this unlikely case, you reset the tuples_deleted value that was
> > received earlier from heap_page_prune. This results in inaccurate
> > statistics, as repeated calls to heap_page_prune on the same page will
> > not count tuples that were deleted in a previous call.
>
> I don't think that it matters. The "tupgone=true" case has no test
> coverage (see coverage.postgresql.org) and it would be hard to ensure
> that the "res == HEAPTUPLE_DEAD" that replaces it gets coverage, for
> the same reasons. Keeping the rules as simple as possible seem like a
> good goal. What's more, it's absurdly unlikely that this will happen
> even once. The race is very tight. Postgres will do opportunistic
> pruning at almost any point, often from a SELECT, so the chances of
> anybody noticing an inaccuracy from this issue in particular are
> remote in the extreme.
>
> Actually, a big problem with the tuples_deleted value surfaced by both
> log_autovacuum and by VACUUM VERBOSE is that it can be wildly
> different to the number of LP_DEAD items. This is commonly the case
> with tables that get lots of non-HOT updates, with opportunistic
> pruning kicking in a lot, with LP_DEAD items constantly accumulating.
> By the time VACUUM comes around, it reports an absurdly low
> tuples_deleted because it's using this what-I-pruned-just-now
> definition. The opposite extreme is also possible, since there might
> be far fewer LP_DEAD items when VACUUM does a lot of pruning of HOT
> chains specifically.

That seems reasonable as well.

> > 0004:
> > > + * truncate to. Note that we avoid truncating the line pointer to 0 items
> > > + * in all cases.
> > > + */
> >
> > Is there a specific reason that I'm not getting as to why this is necessary?
>
> I didn't say it was strictly necessary. There is special-case handling
> of PageIsEmpty() at various points, though, including within VACUUM.
> It seemed worth avoiding hitting that.

That seems reasonable.

> Perhaps I should change it to not work that way.

All cases of PageIsEmpty on heap pages seem to be optimized short-path
handling of empty pages in vacuum, so I'd say that it is better to
fully truncate the array, but I'd be fully OK with postponing that
specific change for further analysis.

> > 0005:
> > > + The default is 1.8 billion transactions. Although users can set this value
> > > + anywhere from zero to 2.1 billion, <command>VACUUM</command> will silently
> > > + adjust the effective value more than 105% of
> > > + <xref linkend="guc-autovacuum-freeze-max-age"/>, so that only
> > > + anti-wraparound autovacuums and aggressive scans have a chance to skip
> > > + index cleanup.
> >
> > This documentation doesn't quite make it clear what its relationship
> > is with autovacuum_freeze_max_age. How about the following: "...
> > >VACUUM< will use the higher of this value and 105% of
> > >guc-autovacuum-freeze-max-age<, so that only ...". It's only slightly
> > easier to read, but at least it conveys that values lower than 105% of
> > autovacuum_freeze_max_age are not considered. The same can be said for
> > the multixact guc documentation.
>
> This does need work too.
>
> I'm going to push 0002- and 0003- tomorrow morning pacific time. I'll
> publish a new set of patches tomorrow, once I've finished that up. The
> last 2 patches will require a lot of focus to get over the line for
> Postgres 14.

If you have updated patches, I'll try to check them this evening (CEST).

With regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-06 14:17:52 Re: TRUNCATE on foreign table
Previous Message Fujii Masao 2021-04-06 14:01:05 Re: Stronger safeguard for archive recovery not to miss data