Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2024-12-13 22:53:00
Message-ID: CAAKRu_avdCfrVFfCc09Mg1dJBKX1jc9ptumq0mMV_2c4oQATgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,

Thanks for the review!
Attached v2 should address your feedback and also fixes a few bugs with v1.

I've still yet to run very long-running benchmarks. I did start running more
varied benchmark scenarios -- but all still under two hours. So far, the
behavior is as expected.

> On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
> > Because we want to amortize our eager scanning across a few vacuums,
> > we cap the maximum number of successful eager scans to a percentage of
> > the total number of all-visible but not all-frozen pages in the table
> > (currently 20%).
>
> One thing worth mentioning around here seems that we currently can't
> partially-aggressively freeze tuples that are "too young" and how that
> interacts with everything else.

I'm not sure I know what you mean. Are you talking about how we don't freeze
tuples that are visible to everyone but younger than the freeze limit?

> > In the attached chart.png, you can see the vm_page_freezes climbing
> > steadily with the patch, whereas on master, there are sudden spikes
> > aligned with the aggressive vacuums. You can also see that the number
> > of pages that are all-visible but not all-frozen grows steadily on
> > master until the aggressive vacuum. This is vacuum's "backlog" of
> > freezing work.
>
> What's the reason for all-visible-but-not-all-frozen to increase to a higher
> value initially than where it later settles?

My guess is that it has to do with shorter, more frequent vacuums at the
beginning of the benchmark when the relation is smaller (and we haven't
exceeded shared buffers or memory yet). They are setting pages all-visible, but
we haven't used up enough xids yet to qualify for an eager vacuum.

The peak of AVnAF pages aligns with the start of the first eager vacuum. We
don't do any eager scanning until we are sure there is some data requiring
freeze (see this criteria):

if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
vacrel->cutoffs.FreezeLimit))

Once we have used up enough xids to qualify for the first eager vacuum, the
number of AVnAF pages starts to go down.

It would follow from this theory that we would see a build-up like this after
each relfrozenxid advancement (so after the next aggressive vacuum).

But I think we don't see this because the vacuums are longer by the time
aggressive vacuums have started, so we end up using up enough XIDs between
vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.

That is just my theory though.

> > Below is the comparative WAL volume, checkpointer and background
> > writer writes, reads and writes done by all other backend types, time
> > spent vacuuming in milliseconds, and p99 latency. Notice that overall
> > vacuum IO time is substantially lower with the patch.
> >
> > version wal cptr_bgwriter_w other_rw vac_io_time p99_lat
> > patch 770 GB 5903264 235073744 513722 1
> > master 767 GB 5908523 216887764 1003654 16
>
> Hm. It's not clear to me why other_rw is higher with the patch? After all,
> given the workload, there's no chance of unnecessarily freezing tuples? Is
> that just because at the end of the benchmark there's leftover work?

So other_rw is mostly client backend and autovacuum reads and writes. It is
higher with the patch because there are actually more vacuum reads and writes
with the patch than on master. However the autovacuum worker read and write
time is much lower. Those blocks are more often in shared buffers, I would
guess.

> > From 67781cc2511bb7d62ccc9461f1787272820abcc4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 28 Oct 2024 11:07:50 -0400
> > Subject: [PATCH v1 4/9] Replace uses of blkno local variable in
> > lazy_scan_heap()
>
> Largely LGTM, but I'm not sure that it's worth having as a separate commit.

I've squashed it into the commit that makes heap_vac_scan_next_block() return
the next block number.

> > From 67b5565ad57d3b196695f85811dde2044ba79f3e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 28 Oct 2024 11:14:24 -0400
> > Subject: [PATCH v1 5/9] Move vacuum VM buffer release
> >
> > The VM buffer for the next unskippable block can be released after the
> > main loop in lazy_scan_heap(). Doing so de-clutters
> > heap_vac_scan_next_block() and opens up more refactoring options.
>
> That's vague...

I've changed the commit message justification to the fact that all the other
vmbuffer releases in vacuum code are in the body of lazy_scan_heap() too (not
in helpers).

> > From 8485dc400b3d4e9f895170af4f5fb1bb959b8495 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 28 Oct 2024 11:36:58 -0400
> > Subject: [PATCH v1 6/9] Remove superfluous next_block local variable in vacuum
> > code
> >
> > Reduce the number of block related variables in lazy_scan_heap() and its
> > helpers by removing the next_block local variable from
> > heap_vac_scan_next_block().
>
> I don't mind this change, but I also don't get how it's related to anything
> else here or why it's really better than the status quo.

So because this feature adds more complexity to the already complex vacuum code
selecting what blocks to scan, I thought it was important to reduce the number
of variables.
I think the patches in this set that seek to streamline
heap_vac_scan_next_block() help overall clarity.

> > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> > index 4b1eadea1f2..52c9d49f2b1 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -1112,19 +1112,17 @@ static bool
> > heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
> > bool *all_visible_according_to_vm)
> > {
> > - BlockNumber next_block;
> > -
> > /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
> > - next_block = vacrel->current_block + 1;
> > + vacrel->current_block++;
>
> I realize this isn't introduced in this commit, but darn, that's ugly.

I didn't like having special cases for block 0 in heap_vac_scan_next_block()
and personally prefer it this way. I thought it made it more error-prone and
harder to understand.

> > From 78ad9e022b95e024ff5bfa96af78e9e44730c970 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 28 Oct 2024 11:42:10 -0400
> > Subject: [PATCH v1 7/9] Make heap_vac_scan_next_block() return BlockNumber
>
>
> > @@ -857,7 +857,8 @@ lazy_scan_heap(LVRelState *vacrel)
> > vacrel->next_unskippable_allvis = false;
> > vacrel->next_unskippable_vmbuffer = InvalidBuffer;
> >
> > - while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
> > + while (BlockNumberIsValid(blkno = heap_vac_scan_next_block(vacrel,
> > + &all_visible_according_to_vm)))
>
> Personally I'd write this as
>
> while (true)
> {
> BlockNumber blkno;
>
> blkno = heap_vac_scan_next_block(vacrel, ...);
>
> if (!BlockNumberIsValid(blkno))
> break;
>
> Mostly because it's good to use more minimal scopes when possible,
> particularly when previously the scope intentionally was larger. But also
> partially because I don't love variable assignments inside a macro call,
> inside a while().

I changed it to be as you suggest. I will concede that variable assignments
inside a macro call inside a while() are a bit much.

> > From 818d1c3b068c6705611256cfc3eb1f10bdc0b684 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Fri, 1 Nov 2024 18:25:05 -0400
> > Subject: [PATCH v1 8/9] WIP: Add more general summary to vacuumlazy.c
> >
> > Currently the summary at the top of vacuumlazy.c provides some specific
> > details related to the new dead TID storage in 17. I plan to add a
> > summary and maybe some sub-sections to contextualize it.
>
> I like this idea. It's hard to understand vacuumlazy.c without already
> understanding vacuumlazy.c, which isn't a good situation.

I've added a bit more to it in this version, but I likely could use some more
text on index vacuuming. I'm thinking I'll commit something minimal but correct
and let people elaborate more later.

> > ---
> > src/backend/access/heap/vacuumlazy.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> > index 7ce69953ba0..15a04c6b10b 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -3,6 +3,17 @@
> > * vacuumlazy.c
> > * Concurrent ("lazy") vacuuming.
> > *
> > + * Heap relations are vacuumed in three main phases. In the first phase,
> > + * vacuum scans relation pages, pruning and freezing tuples and saving dead
> > + * tuples' TIDs in a TID store. If that TID store fills up or vacuum finishes
> > + * scanning the relation, it progresses to the second phase: index vacuuming.
> > + * After index vacuuming is complete, vacuum scans the blocks of the relation
> > + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing
> > + * that space for future tuples. Finally, vacuum may truncate the relation if
> > + * it has emptied pages at the end. XXX: this summary needs work.
>
> Yea, at least we ought to mention that the phasing can be different when there
> are no indexes and that the later phases can heuristically be omitted when
> there aren't enough dead items.

I've done this.

> > From f21f0bab1dbe675be4b4dddcb2eea486d8a69d36 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 28 Oct 2024 12:15:08 -0400
> > Subject: [PATCH v1 9/9] Eagerly scan all-visible pages to amortize aggressive
> > vacuum
> >
> > Introduce semi-aggressive vacuums, which scan some of the all-visible
> > but not all-frozen pages in the relation to amortize the cost of an
> > aggressive vacuum.
>
> I wonder if "aggressive" is really the right terminology going
> forward... Somehow it doesn't seem particularly descriptive anymore if, in
> many workloads, almost all vacuums are going to be aggressive-ish.

I've changed it to normal, eager, and aggressive

> > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> > index 15a04c6b10b..adabb5ff5f1 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > + *
> > + * On the assumption that different regions of the table are likely to contain
> > + * similarly aged data, we use a localized failure cap instead of a global cap
> > + * for the whole relation. The failure count is reset on each region of the
> > + * table, comprised of RELSEG_SIZE blocks (or 1/4 of the table size for a
> > + * small table). In each region, we tolerate MAX_SUCCESSIVE_EAGER_SCAN_FAILS
> > + * before suspending eager scanning until the end of the region.
>
> I'm a bit surprised to see such large regions. Why not something finer, in the
> range of a few megabytes? The FSM steers new tuples quite aggressively to the
> start of the table, which means that in many workloads there will be old and
> new data interspersed at the start of the table. Using RELSEG_SIZE sized
> regions for semi-aggressive vacuuming will mean that we'll often not do any
> semi-aggressive processing beyond the start of the relation, as we'll reach
> the failure rate quickly.

I've changed the region size to 32 MB but I also decreased the allowed failures
to 128 blocks per region (to avoid eager scanning too many blocks if we are
failing to freeze them).

This doesn't completely address your concern about missing freezing
opportunities.

However, this version does randomize the eager scan start block selection in
the first region. The first eager scan block will be somewhere in the first
region to avoid re-scanning unfreezable blocks across multiple vacuums. I will
note that this problem is unlikely to persist across multiple vacuums. If the
page is being modified frequently, it won't be all-visible. You would have to
have this pattern for it to be an issue: modify the page, vacuum, vacuum,
modify, vacuum, vacuum (since the first vacuum after the modification will set
the page all-visible).

> I also find it layering-wise a bit weird to use RELSEG_SIZE, that's really imo
> is just an md.c concept.

Makes sense. New version has a dedicated macro.

> > +/*
> > + * Semi-aggressive vacuums eagerly scan some all-visible but not all-frozen
> > + * pages. Since our goal is to freeze these pages, an eager scan that fails to
> > + * set the page all-frozen in the VM is considered to have "failed".
> > + *
> > + * On the assumption that different regions of the table tend to have
> > + * similarly aged data, once we fail to freeze MAX_SUCCESSIVE_EAGER_SCAN_FAILS
> > + * blocks, we suspend eager scanning until vacuum has progressed to another
> > + * region of the table with potentially older data.
> > + */
> > +#define MAX_SUCCESSIVE_EAGER_SCAN_FAILS 1024
>
> Can this really be a constant, given that the semi-aggressive regions are
> shrunk for small tables?

Good point. This version actually disables eager scans for relations smaller
than a single region.

> > {
> > /* Target heap relation and its indexes */
> > @@ -153,8 +208,22 @@ typedef struct LVRelState
> > + /*
> > + * Whether or not this is an aggressive, semi-aggressive, or unaggressive
> > + * VACUUM. A fully aggressive vacuum must set relfrozenxid >= FreezeLimit
> > + * and therefore must scan every unfrozen tuple. A semi-aggressive vacuum
> > + * will scan a certain number of all-visible pages until it is downgraded
> > + * to an unaggressive vacuum.
> > + */
> > + VacAggressive aggressive;
>
> - why is VacAggressive defined in vacuum.h? Isn't this fairly tightly coupled
> to heapam?

It was because I had vacuum_get_cutoffs() return the aggressiveness. I've
changed this so that the enum can be defined in vacuumlazy.c.

> - Kinda feels like the type should be named VacAggressivness or such?

I changed it to VacEagerness.

> > + /*
> > + * A semi-aggressive vacuum that has failed to freeze too many eagerly
> > + * scanned blocks in a row suspends eager scanning. unaggressive_to is the
> > + * block number of the first block eligible for resumed eager scanning.
> > + */
> > + BlockNumber unaggressive_to;
>
> What's it set to otherwise? What is it set to in aggressive vacuums?

The idea was to set it to 0 for aggressive vacuum and never advance it.

However, for eager vacuum, there was actually a problem with this version of
the patch set that meant that we weren't actually enabling and disabling eager
scanning per region. Instead we were waiting until we hit the fail limit and
then disabling eager scanning for region-size # of blocks. This was effectively
a cooling off period as opposed to a region-based approach.

I've changed this in the current version. Now, for eager vacuum, we save the
block number of the next eager scan region in next_eager_scan_region_start.
Then when we cross over into the next region, we advance it. Eager scanning is
enabled as long as eager_pages.remaining_fails is > 0. When we cross into a new
region, we reset it to re-enable eager scanning if it was disabled.

For normal and aggressive vacuum, I set next_eager_scan_region_start to
InvalidBlockNumber to ensure we never trigger region calculations. However, for
aggressive vacuums, I do keep track of all-visible pages scanned using the same
counter in LVRelState that counts eager pages scanned. I'm not sure if it is
confusing to use some of this accounting labeled as eager scan accounting for
aggressive vacuum. In fact, in the logs, I print out all-visible pages scanned
-- which will be > 0 for both aggressive vacuums and eager vacuums.

There are tradeoffs between using the eager scan counters for all vacuum types
and initializing them to different values based on the vacuum eagerness level
and guarding all reference to them by vacuum type (and not initializing them to
valid but special values).

Let me know what you think about using the counters for aggressive vacuum too.

On the topic of the region-based method vs the cool-off method, with the region
method, if all of the failures are concentrated at the end of the region, we
will start eager scanning again as soon as we start the next region. With the
cool-off method we would wait a consistent number of blocks. But I think the
region method is still better. The region cutoff may be arbitrary but it
produces a consistent amount of extra scanning. What do you think?

> > + /*
> > + * The number of eagerly scanned blocks a semi-aggressive vacuum failed to
> > + * freeze (due to age) in the current eager scan region. It is reset each
> > + * time we hit MAX_SUCCESSIVE_EAGER_SCAN_FAILS.
> > + */
> > + BlockNumber eager_scanned_failed_frozen;
> > +
> > + /*
> > + * The remaining number of blocks a semi-aggressive vacuum will consider
> > + * eager scanning. This is initialized to EAGER_SCAN_SUCCESS_RATE of the
> > + * total number of all-visible but not all-frozen pages.
> > + */
> > + BlockNumber remaining_eager_scan_successes;
>
> I think it might look better if you just bundled these into a struct like
>
> struct
> {
> BlockNumber scanned;
> BlockNumber failed_frozen;
> BlockNumber remaining_successes;
> } eager_pages;

Done

> > + visibilitymap_count(rel, &orig_rel_allvisible, &orig_rel_allfrozen);
> > + vacrel->remaining_eager_scan_successes =
> > + (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * (orig_rel_allvisible - orig_rel_allfrozen));
> >
> > if (verbose)
> > {
> > - if (vacrel->aggressive)
> > - ereport(INFO,
> > - (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> > - vacrel->dbname, vacrel->relnamespace,
> > - vacrel->relname)));
> > - else
> > - ereport(INFO,
> > - (errmsg("vacuuming \"%s.%s.%s\"",
> > - vacrel->dbname, vacrel->relnamespace,
> > - vacrel->relname)));
> > + switch (vacrel->aggressive)
> > + {
> > + case VAC_UNAGGRESSIVE:
> > + ereport(INFO,
> > + (errmsg("vacuuming \"%s.%s.%s\"",
> > + vacrel->dbname, vacrel->relnamespace,
> > + vacrel->relname)));
> > + break;
> > +
> > + case VAC_AGGRESSIVE:
> > + ereport(INFO,
> > + (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> > + vacrel->dbname, vacrel->relnamespace,
> > + vacrel->relname)));
> > + break;
> > +
> > + case VAC_SEMIAGGRESSIVE:
> > + ereport(INFO,
> > + (errmsg("semiaggressively vacuuming \"%s.%s.%s\"",
> > + vacrel->dbname, vacrel->relnamespace,
> > + vacrel->relname)));
> > + break;
> > + }
>
> Wonder if we should have a function that returns the aggressiveness of a
> vacuum as an already translated string. There are other places where we emit
> the aggressiveness as part of a message, and it's pretty silly to duplicate
> most of the message.

I've added a helper to return text with the vacuum eagerness level. I used
gettext_noop() to mark it for translation later because I think the autovacuum
logging uses _() and ereports are translated. But I'm not sure this is
completely right.

> > @@ -545,11 +668,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
> > * Non-aggressive VACUUMs may advance them by any amount, or not at all.
> > */
> > Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
> > - TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
> > + TransactionIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ?
> > + vacrel->cutoffs.FreezeLimit :
> > vacrel->cutoffs.relfrozenxid,
> > vacrel->NewRelfrozenXid));
> > Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
> > - MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
> > + MultiXactIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ?
> > + vacrel->cutoffs.MultiXactCutoff :
> > vacrel->cutoffs.relminmxid,
> > vacrel->NewRelminMxid));
> > if (vacrel->skippedallvis)
>
> These are starting to feel somewhat complicated. Wonder if it'd be easier to
> read if they were written as normal ifs.

Did this.

> > +/*
> > + * Helper to decrement a block number to 0 without wrapping around.
> > + */
> > +static void
> > +decrement_blkno(BlockNumber *block)
> > +{
> > + if ((*block) > 0)
> > + (*block)--;
> > +}

I've removed it.

> > @@ -956,11 +1094,23 @@ lazy_scan_heap(LVRelState *vacrel)
> > if (!got_cleanup_lock)
> > LockBuffer(buf, BUFFER_LOCK_SHARE);
> >
> > + page_freezes = vacrel->vm_page_freezes;
> > +
> > /* Check for new or empty pages before lazy_scan_[no]prune call */
> > if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
> > vmbuffer))
> > {
> > /* Processed as new/empty page (lock and pin released) */
> > +
> > + /* count an eagerly scanned page as a failure or a success */
> > + if (was_eager_scanned)
> > + {
> > + if (vacrel->vm_page_freezes > page_freezes)
> > + decrement_blkno(&vacrel->remaining_eager_scan_successes);
> > + else
> > + vacrel->eager_scanned_failed_frozen++;
> > + }
> > +
> > continue;
> > }
>
> Maybe I'm confused, but ISTM that remaining_eager_scan_successes shouldn't
> actually be a BlockNumber, given it doesn't actually indicates a specific
> block...

All of the other counters like this in LVRelState are BlockNumbers (see
lpdead_item_pages, missed_dead_pages, etc). I'm fine with not using a
BlockNumber for this, but I would want to have a justification for why it is
different than the other cumulative counters for numbers of blocks.

> I don't understand why we would sometimes want to treat empty pages as a
> failure? They can't fail to be frozen, no?
>
> I'm not sure it makes sense to count them as progress towards the success
> limit either - afaict we just rediscovered free space is the table. That's imo
> separate from semi-aggressive freezing.

That's a great point. In fact, I don't think we could ever have exercised this
code anyway. Since we will always freeze it, there shouldn't be any all-visible
but not all-frozen empty pages. I've removed this code.

> Storing page_freezes as a copy of vacrel->vm_page_freezes and then checking if
> that increased feels like a somewhat ugly way of tracking if freezing
> happend. There's no more direct way.

This version uses an output parameter to lazy_scan_prune().

> Why is decrement_blkno() needed? How can we ever get into negative territory?
> Shouldn't eager scanning have been disabled when
> remaining_eager_scan_successes reaches zero and thus prevent
> remaining_eager_scan_successes from ever going below zero?

Right I've removed this.

> > @@ -1144,7 +1310,65 @@ heap_vac_scan_next_block(LVRelState *vacrel,
> > */
> > bool skipsallvis;
> >
> > - find_next_unskippable_block(vacrel, &skipsallvis);
> > + /*
> > + * Figure out if we should disable eager scan going forward or
> > + * downgrade to an unaggressive vacuum altogether.
> > + */
> > + if (vacrel->aggressive == VAC_SEMIAGGRESSIVE)
> > + {
> > + /*
> > + * If we hit our success limit, there is no need to eagerly scan
> > + * any additional pages. Downgrade the vacuum to unaggressive.
> > + */
> > + if (vacrel->remaining_eager_scan_successes == 0)
> > + vacrel->aggressive = VAC_UNAGGRESSIVE;
> > +
> > + /*
> > + * If we hit the max number of failed eager scans for this region
> > + * of the table, figure out where the next eager scan region
> > + * should start. Eager scanning is effectively disabled until we
> > + * scan a block in that new region.
> > + */
> > + else if (vacrel->eager_scanned_failed_frozen >=
> > + MAX_SUCCESSIVE_EAGER_SCAN_FAILS)
> > + {
> > + BlockNumber region_size,
> > + offset;
> > +
>
> Why are we doing this logic here, rather than after incrementing
> eager_scanned_failed_frozen? Seems that'd limit the amount of times we need to
> run through this logic substantially?

I've moved it there. Actually, all of the logic has been moved around to make
the region method work.

> Hm - isn't consider_eager_scan potentially outdated after
> find_next_unskippable_block() iterated over a bunch of blocks? If
> consider_eager_scan is false, this could very well go into the next
> "semi-aggressive region", where consider_eager_scan should have been
> re-enabled, no?

Yep. Great catch. I believe I've fixed this by removing consider_eager_scan and
instead checking if we have remaining failures in this region and resetting
remaining failures whenever we enter a new region.

Also turns out was_eager_scanned had some issues that I believe I've now fixed
as well.

- Melanie

Attachment Content-Type Size
v2-0001-Rename-LVRelState-frozen_pages.patch text/x-patch 3.1 KB
v2-0005-Move-vacuum-VM-buffer-release.patch text/x-patch 1.5 KB
v2-0002-Make-visibilitymap_set-return-previous-state-of-v.patch text/x-patch 3.5 KB
v2-0004-Remove-leftover-mentions-of-XLOG_HEAP2_FREEZE_PAG.patch text/x-patch 1.6 KB
v2-0003-Count-pages-set-all-visible-and-all-frozen-in-VM-.patch text/x-patch 8.5 KB
v2-0006-Remove-superfluous-next_block-local-variable-in-v.patch text/x-patch 3.1 KB
v2-0007-Make-heap_vac_scan_next_block-return-BlockNumber.patch text/x-patch 5.9 KB
v2-0009-Add-more-general-summary-to-vacuumlazy.c.patch text/x-patch 3.1 KB
v2-0008-Refactor-vacuum-assert-into-multiple-if-statement.patch text/x-patch 2.4 KB
v2-0010-Eagerly-scan-all-visible-pages-to-amortize-aggres.patch text/x-patch 26.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-12-13 23:44:22 Exceptional md.c paths for recovery and zero_damaged_pages
Previous Message Andres Freund 2024-12-13 22:41:15 Re: Hot standby queries see transient all-zeros pages