From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: New IndexAM API controlling index vacuum strategies |
Date: | 2021-01-21 14:24:10 |
Message-ID: | CAD21AoCmhri3a2NQXN4NN5c4SBX+imBhm0_UOxBsd3xdN5XWmg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > After more thought, I think that ambulkdelete needs to be able to
> > refer the answer to amvacuumstrategy. That way, the index can skip
> > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> > want to do that.
>
> Makes sense.
>
> BTW, your patch has bitrot already. Peter E's recent pageinspect
> commit happens to conflict with this patch. It might make sense to
> produce a new version that just fixes the bitrot, so that other people
> don't have to deal with it each time.
>
> > I’ve attached the updated version patch that includes the following changes:
>
> Looks good. I'll give this version a review now. I will do a lot more
> soon. I need to come up with a good benchmark for this, that I can
> return to again and again during review as needed.
Thank you for reviewing the patches.
>
> Some feedback on the first patch:
>
> * Just so you know: I agree with you about handling
> VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
> think that it's better to do that there, even though this choice may
> have some downsides.
>
> * Can you add some "stub" sgml doc changes for this? Doesn't have to
> be complete in any way. Just a placeholder for later, that has the
> correct general "shape" to orientate the reader of the patch. It can
> just be a FIXME comment, plus basic mechanical stuff -- details of the
> new amvacuumstrategy_function routine and its signature.
>
0002 patch had the doc update (I mistakenly included it to 0002
patch). Is that update what you meant?
> Some feedback on the second patch:
>
> * Why do you move around IndexVacuumStrategy in the second patch?
> Looks like a rebasing oversight.
Check.
>
> * Actually, do we really need the first and second patches to be
> separate patches? I agree that the nbtree patch should be a separate
> patch, but dividing the first two sets of changes doesn't seem like it
> adds much. Did I miss some something?
I separated the patches since I used to have separate patches when
adding other index AM options required by parallel vacuum. But I
agreed to merge the first two patches.
>
> * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
> MaxHeapTuplesPerPage appropriate? Here is the relevant section from
> the patch:
>
> diff --git a/src/include/access/htup_details.h
> b/src/include/access/htup_details.h
> index 7c62852e7f..038e7cd580 100644
> --- a/src/include/access/htup_details.h
> +++ b/src/include/access/htup_details.h
> @@ -563,17 +563,18 @@ do { \
> /*
> * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
> * fit on one heap page. (Note that indexes could have more, because they
> - * use a smaller tuple header.) We arrive at the divisor because each tuple
> - * must be maxaligned, and it must have an associated line pointer.
> + * use a smaller tuple header.) We arrive at the divisor because each line
> + * pointer must be maxaligned.
> *** SNIP ***
> #define MaxHeapTuplesPerPage \
> - ((int) ((BLCKSZ - SizeOfPageHeaderData) / \
> - (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
> + ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(ItemIdData)))))
>
> It's true that ItemIdData structs (line pointers) are aligned, but
> they're not MAXALIGN()'d. If they were then the on-disk size of line
> pointers would generally be 8 bytes, not 4 bytes.
You're right. Will fix.
>
> * Maybe it would be better if you just changed the definition such
> that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> no other changes? (Some variant of this suggestion might be better,
> not sure.)
>
> For some reason that feels a bit safer: we still have an "imaginary
> tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> much less than the current 3 MAXALIGN() quantums (i.e. what
> MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> that this alternative approach will be noticeably less effective
> within vacuumlazy.c?
>
> Note that you probably understand the issue with MaxHeapTuplesPerPage
> for vacuumlazy.c better than I do currently. I'm still trying to
> understand your choices, and to understand what is really important
> here.
Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my
thoughts on the issue with MaxHeapTuplesPerPage yesterday. I think we
need to discuss how to deal with that.
>
> * Maybe add a #define for the value 0.7? (I refer to the value used in
> choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD
> line pointers that we consider too many" cut off point, which is to be
> applied throughout lazy_scan_heap() processing.)
>
Agreed.
> * I notice that your new lazy_vacuum_table_and_indexes() function is
> the only place that calls lazy_vacuum_table_and_indexes(). I think
> that you should merge them together -- replace the only remaining call
> to lazy_vacuum_table_and_indexes() with the body of the function
> itself. Having a separate lazy_vacuum_table_and_indexes() function
> doesn't seem useful to me -- it doesn't actually hide complexity, and
> might even be harder to maintain.
lazy_vacuum_table_and_indexes() is called at two places: after
maintenance_work_mem run out (around L1097) and the end of
lazy_scan_heap() (around L1726). I defined this function to pack the
operations such as choosing a strategy, vacuuming indexes and
vacuuming heap. Without this function, will we end up writing the same
codes twice there?
>
> * I suggest thinking about what the last item will mean for the
> reporting that currently takes place in
> lazy_vacuum_table_and_indexes(), but will now go in an expanded
> lazy_vacuum_table_and_indexes() -- how do we count the total number of
> index scans now?
>
> I don't actually believe that the logic needs to change, but some kind
> of consolidation and streamlining seems like it might be helpful.
> Maybe just a comment that says "note that all index scans might just
> be no-ops because..." -- stuff like that.
What do you mean by the last item and what report? I think
lazy_vacuum_table_and_indexes() itself doesn't report anything and
vacrelstats->num_index_scan counts the total number of index scans.
>
> * Any idea about how hard it will be to teach is_wraparound VACUUMs to
> skip index cleanup automatically, based on some practical/sensible
> criteria?
One simple idea would be to have a to-prevent-wraparound autovacuum
worker disables index cleanup (i.g., setting VACOPT_TERNARY_DISABLED
to index_cleanup). But a downside (but not a common case) is that
since a to-prevent-wraparound vacuum is not necessarily an aggressive
vacuum, it could skip index cleanup even though it cannot move
relfrozenxid forward.
>
> It would be nice to have a basic PoC for that, even if it remains a
> PoC for the foreseeable future (i.e. even if it cannot be committed to
> Postgres 14). This feature should definitely be something that your
> patch series *enables*. I'd feel good about having covered that
> question as part of this basic design work if there was a PoC. That
> alone should make it 100% clear that it's easy to do (or no harder
> than it should be -- it should ideally be compatible with your basic
> design). The exact criteria that we use for deciding whether or not to
> skip index cleanup (which probably should not just be "this VACUUM is
> is_wraparound, good enough" in the final version) may need to be
> debated at length on pgsql-hackers. Even still, it is "just a detail"
> in the code. Whereas being *able* to do that with your design (now or
> in the future) seems essential now.
Agreed. I'll write a PoC patch for that.
>
> > * Store the answers to amvacuumstrategy into either the local memory
> > or DSM (in parallel vacuum case) so that ambulkdelete can refer the
> > answer to amvacuumstrategy.
> > * Fix regression failures.
> > * Update the documentation and commments.
> >
> > Note that 0003 patch is still PoC quality, lacking the btree meta page
> > version upgrade.
>
> This patch is not the hard part, of course -- there really isn't that
> much needed here compared to vacuumlazy.c. So this patch seems like
> the simplest 1 out of the 3 (at least to me).
>
> Some feedback on the third patch:
>
> * The new btm_last_deletion_nblocks metapage field should use P_NONE
> (which is 0) to indicate never having been vacuumed -- not
> InvalidBlockNumber (which is 0xFFFFFFFF).
>
> This is more idiomatic in nbtree, which is nice, but it has a very
> significant practical advantage: it ensures that every heapkeyspace
> nbtree index (i.e. those on recent nbtree versions) can be treated as
> if it has the new btm_last_deletion_nblocks field all along, even when
> it actually built on Postgres 12 or 13. This trick will let you avoid
> dealing with the headache of bumping BTREE_VERSION, which is a huge
> advantage.
>
> Note that this is the same trick I used to avoid bumping BTREE_VERSION
> when the btm_allequalimage field needed to be added (for the nbtree
> deduplication feature added to Postgres 13).
>
That's a nice way with a great advantage. I'll use P_NONE.
> * Forgot to do this in the third patch (think I made this same mistake
> once myself):
>
> diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
> index 8bb180bbbe..88dfea9af3 100644
> --- a/contrib/pageinspect/btreefuncs.c
> +++ b/contrib/pageinspect/btreefuncs.c
> @@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS)
> BTMetaPageData *metad;
> TupleDesc tupleDesc;
> int j;
> - char *values[9];
> + char *values[10];
> Buffer buffer;
> Page page;
> HeapTuple tuple;
> @@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS)
Check.
I'm updating and testing the patch. I'll submit the updated version
patches tomorrow.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2021-01-21 14:30:12 | Re: Stronger safeguard for archive recovery not to miss data |
Previous Message | osumi.takamichi@fujitsu.com | 2021-01-21 14:19:18 | RE: Enhance traceability of wal_level changes for backup management |