Re: New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-01-25 08:19:43
Message-ID: CAD21AoCS94vK1fs-_=R5J3tp2DsZPq9zdcUu2pk6fbr7BS7quA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 21, 2021 at 11:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > After more thought, I think that ambulkdelete needs to be able to
> > > refer the answer to amvacuumstrategy. That way, the index can skip
> > > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> > > want to do that.
> >
> > Makes sense.
> >
> > BTW, your patch has bitrot already. Peter E's recent pageinspect
> > commit happens to conflict with this patch. It might make sense to
> > produce a new version that just fixes the bitrot, so that other people
> > don't have to deal with it each time.
> >
> > > I’ve attached the updated version patch that includes the following changes:
> >
> > Looks good. I'll give this version a review now. I will do a lot more
> > soon. I need to come up with a good benchmark for this, that I can
> > return to again and again during review as needed.
>
> Thank you for reviewing the patches.
>
> >
> > Some feedback on the first patch:
> >
> > * Just so you know: I agree with you about handling
> > VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
> > think that it's better to do that there, even though this choice may
> > have some downsides.
> >
> > * Can you add some "stub" sgml doc changes for this? Doesn't have to
> > be complete in any way. Just a placeholder for later, that has the
> > correct general "shape" to orientate the reader of the patch. It can
> > just be a FIXME comment, plus basic mechanical stuff -- details of the
> > new amvacuumstrategy_function routine and its signature.
> >
>
> 0002 patch had the doc update (I mistakenly included it to 0002
> patch). Is that update what you meant?
>
> > Some feedback on the second patch:
> >
> > * Why do you move around IndexVacuumStrategy in the second patch?
> > Looks like a rebasing oversight.
>
> Check.
>
> >
> > * Actually, do we really need the first and second patches to be
> > separate patches? I agree that the nbtree patch should be a separate
> > patch, but dividing the first two sets of changes doesn't seem like it
> > adds much. Did I miss some something?
>
> I separated the patches since I used to have separate patches when
> adding other index AM options required by parallel vacuum. But I
> agreed to merge the first two patches.
>
> >
> > * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
> > MaxHeapTuplesPerPage appropriate? Here is the relevant section from
> > the patch:
> >
> > diff --git a/src/include/access/htup_details.h
> > b/src/include/access/htup_details.h
> > index 7c62852e7f..038e7cd580 100644
> > --- a/src/include/access/htup_details.h
> > +++ b/src/include/access/htup_details.h
> > @@ -563,17 +563,18 @@ do { \
> > /*
> > * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
> > * fit on one heap page. (Note that indexes could have more, because they
> > - * use a smaller tuple header.) We arrive at the divisor because each tuple
> > - * must be maxaligned, and it must have an associated line pointer.
> > + * use a smaller tuple header.) We arrive at the divisor because each line
> > + * pointer must be maxaligned.
> > *** SNIP ***
> > #define MaxHeapTuplesPerPage \
> > - ((int) ((BLCKSZ - SizeOfPageHeaderData) / \
> > - (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
> > + ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(ItemIdData)))))
> >
> > It's true that ItemIdData structs (line pointers) are aligned, but
> > they're not MAXALIGN()'d. If they were then the on-disk size of line
> > pointers would generally be 8 bytes, not 4 bytes.
>
> You're right. Will fix.
>
> >
> > * Maybe it would be better if you just changed the definition such
> > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> > no other changes? (Some variant of this suggestion might be better,
> > not sure.)
> >
> > For some reason that feels a bit safer: we still have an "imaginary
> > tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> > much less than the current 3 MAXALIGN() quantums (i.e. what
> > MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> > that this alternative approach will be noticeably less effective
> > within vacuumlazy.c?
> >
> > Note that you probably understand the issue with MaxHeapTuplesPerPage
> > for vacuumlazy.c better than I do currently. I'm still trying to
> > understand your choices, and to understand what is really important
> > here.
>
> Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my
> thoughts on the issue with MaxHeapTuplesPerPage yesterday. I think we
> need to discuss how to deal with that.
>
> >
> > * Maybe add a #define for the value 0.7? (I refer to the value used in
> > choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD
> > line pointers that we consider too many" cut off point, which is to be
> > applied throughout lazy_scan_heap() processing.)
> >
>
> Agreed.
>
> > * I notice that your new lazy_vacuum_table_and_indexes() function is
> > the only place that calls lazy_vacuum_table_and_indexes(). I think
> > that you should merge them together -- replace the only remaining call
> > to lazy_vacuum_table_and_indexes() with the body of the function
> > itself. Having a separate lazy_vacuum_table_and_indexes() function
> > doesn't seem useful to me -- it doesn't actually hide complexity, and
> > might even be harder to maintain.
>
> lazy_vacuum_table_and_indexes() is called at two places: after
> maintenance_work_mem run out (around L1097) and the end of
> lazy_scan_heap() (around L1726). I defined this function to pack the
> operations such as choosing a strategy, vacuuming indexes and
> vacuuming heap. Without this function, will we end up writing the same
> codes twice there?
>
> >
> > * I suggest thinking about what the last item will mean for the
> > reporting that currently takes place in
> > lazy_vacuum_table_and_indexes(), but will now go in an expanded
> > lazy_vacuum_table_and_indexes() -- how do we count the total number of
> > index scans now?
> >
> > I don't actually believe that the logic needs to change, but some kind
> > of consolidation and streamlining seems like it might be helpful.
> > Maybe just a comment that says "note that all index scans might just
> > be no-ops because..." -- stuff like that.
>
> What do you mean by the last item and what report? I think
> lazy_vacuum_table_and_indexes() itself doesn't report anything and
> vacrelstats->num_index_scan counts the total number of index scans.
>
> >
> > * Any idea about how hard it will be to teach is_wraparound VACUUMs to
> > skip index cleanup automatically, based on some practical/sensible
> > criteria?
>
> One simple idea would be to have a to-prevent-wraparound autovacuum
> worker disables index cleanup (i.g., setting VACOPT_TERNARY_DISABLED
> to index_cleanup). But a downside (but not a common case) is that
> since a to-prevent-wraparound vacuum is not necessarily an aggressive
> vacuum, it could skip index cleanup even though it cannot move
> relfrozenxid forward.
>
> >
> > It would be nice to have a basic PoC for that, even if it remains a
> > PoC for the foreseeable future (i.e. even if it cannot be committed to
> > Postgres 14). This feature should definitely be something that your
> > patch series *enables*. I'd feel good about having covered that
> > question as part of this basic design work if there was a PoC. That
> > alone should make it 100% clear that it's easy to do (or no harder
> > than it should be -- it should ideally be compatible with your basic
> > design). The exact criteria that we use for deciding whether or not to
> > skip index cleanup (which probably should not just be "this VACUUM is
> > is_wraparound, good enough" in the final version) may need to be
> > debated at length on pgsql-hackers. Even still, it is "just a detail"
> > in the code. Whereas being *able* to do that with your design (now or
> > in the future) seems essential now.
>
> Agreed. I'll write a PoC patch for that.
>
> >
> > > * Store the answers to amvacuumstrategy into either the local memory
> > > or DSM (in parallel vacuum case) so that ambulkdelete can refer the
> > > answer to amvacuumstrategy.
> > > * Fix regression failures.
> > > * Update the documentation and commments.
> > >
> > > Note that 0003 patch is still PoC quality, lacking the btree meta page
> > > version upgrade.
> >
> > This patch is not the hard part, of course -- there really isn't that
> > much needed here compared to vacuumlazy.c. So this patch seems like
> > the simplest 1 out of the 3 (at least to me).
> >
> > Some feedback on the third patch:
> >
> > * The new btm_last_deletion_nblocks metapage field should use P_NONE
> > (which is 0) to indicate never having been vacuumed -- not
> > InvalidBlockNumber (which is 0xFFFFFFFF).
> >
> > This is more idiomatic in nbtree, which is nice, but it has a very
> > significant practical advantage: it ensures that every heapkeyspace
> > nbtree index (i.e. those on recent nbtree versions) can be treated as
> > if it has the new btm_last_deletion_nblocks field all along, even when
> > it actually built on Postgres 12 or 13. This trick will let you avoid
> > dealing with the headache of bumping BTREE_VERSION, which is a huge
> > advantage.
> >
> > Note that this is the same trick I used to avoid bumping BTREE_VERSION
> > when the btm_allequalimage field needed to be added (for the nbtree
> > deduplication feature added to Postgres 13).
> >
>
> That's a nice way with a great advantage. I'll use P_NONE.
>
> > * Forgot to do this in the third patch (think I made this same mistake
> > once myself):
> >
> > diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
> > index 8bb180bbbe..88dfea9af3 100644
> > --- a/contrib/pageinspect/btreefuncs.c
> > +++ b/contrib/pageinspect/btreefuncs.c
> > @@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS)
> > BTMetaPageData *metad;
> > TupleDesc tupleDesc;
> > int j;
> > - char *values[9];
> > + char *values[10];
> > Buffer buffer;
> > Page page;
> > HeapTuple tuple;
> > @@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS)
>
> Check.
>
> I'm updating and testing the patch. I'll submit the updated version
> patches tomorrow.
>

Sorry for the late.

I've attached the updated version patch that incorporated the comments
I got so far.

I merged the previous 0001 and 0002 patches. 0003 patch is now another
PoC patch that disables index cleanup automatically when autovacuum is
to prevent xid-wraparound and an aggressive vacuum.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v3-0002-Skip-btree-bulkdelete-if-the-index-doesn-t-grow.patch application/octet-stream 9.5 KB
v3-0003-PoC-disable-index-cleanup-when-an-anti-wraparound.patch application/octet-stream 1.4 KB
v3-0001-Choose-vacuum-strategy-before-heap-and-index-vacu.patch application/octet-stream 67.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-01-25 08:19:54 RE: Stronger safeguard for archive recovery not to miss data
Previous Message Joel Jacobson 2021-01-25 08:14:19 Re: [HACKERS] GSoC 2017: Foreign Key Arrays