Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2024-12-23 17:50:13
Message-ID: CAAKRu_byJgf3wB8ukv6caXROReS1SRZsVPb7RWP+8qPtdDGykw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Updated v4 attached.

On Wed, Dec 18, 2024 at 4:30 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Dec 17, 2024 at 5:54 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Makes sense. I've attempted to clarify as you suggest in v3.
>
> I would just commit 0001. There's nothing to be gained by waiting around.

Done.

> I don't care about 0002 much. It doesn't seem particularly better or
> worse. I would suggest that if you want to do this, maybe don't split
> up this:
>
> vacrel->blkno = InvalidBlockNumber;
> if (BufferIsValid(vmbuffer))
> ReleaseBuffer(vmbuffer);
>
> You could put the moved code before or after after that instead of the
> middle of it. Unless there's some reason it makes more sense in the
> middle.

Good point. Anyway, I've dropped all the patches except 0006 and 0007
since no one seems to really like them.

> I dislike 0004 as presented. Reducing the scope of blkno is a false
> economy; the function doesn't do much of anything interesting after
> the main loop. And why do we want to report rel_pages to the progress
> reporting machinery instead of blkno? I'd rather that the code report
> where it actually ended up (blkno) rather than reporting where it
> thinks it must have ended up (rel_pages).

Well, part of the reason I didn't like it is that because we start
from 0, we have to artificially set blkno to rel_pages anyway because
we never actually scan a block with BlockNumber == rel_pages. In the
loop, pgstat_progress_update_param() passes blkno before it actually
scans blkno, so I think of it as reporting that it had scanned a total
number of blocks equal to blkno. That's why it seems weird to me that
we use blkno to indicate the number of blocks scanned outside the
loop.

Anyway, I'm fine with setting this aside for now if you feel it is
more confusing with rel_pages instead of blkno.

> I agree that the assert that 0005 replaces is confusing, but replacing
> 8 lines of code with 37 is not an improvement in my book.

Right, yes. It is long. I dropped it along with the others.

> I like 0006. The phase I-II-III terminology doesn't appear anywhere in
> the code (at least, not to my knowledge) but we speak of it that way
> so frequently in email and in conversation that mentioning it
> someplace that a future developer might find seems advisable. I think
> it would be worth mentioning in the second paragraph of the comment
> that we may resume phase I after completing phase III if we entered
> phase II due to lack of memory. I'm not sure what the right way to
> phrase this is -- in a sense, this possibility means that these aren't
> really phases at all, however much we may speak of them that way. But
> as I say, we talk about them that way all the time, so it's good that
> this is finally adding comments to match.

They are more like states in a state machine, I suppose. We've all
been calling them phases for so long though, it's probably best to go
with that. I've added more about how we can return to phases. I also
added a small note about how this makes them more like states but
we've always called them phases.

> Regarding 0007:
>
> - Why do we measure failure as an integer total but success as a
> percentage? Maybe the thought here is that failures are counted per
> region but successes are counted globally across the relation, but
> then I wonder if that's what we want, and if so, whether we need some
> comments to explain why we want that rather than doing both per
> region.

I've added a comment about this to the top of the file. The idea is
that if you don't cap successes, you won't end up amortizing anything.
However, you don't want to limit yourself from freezing the data at
the beginning of the table if you are succeeding. Especially given
that append-mostly workloads will see the most benefit from this
feature.

I did wonder if we should also have some sort of global failure limit
to cap the total pages scanned, but I wondered if that was too much
extra complexity to have a global and local failure limit (also
unclear what to set the global failure limit to). It's probably better
to make the fails per region configurable.

> - I do not like the nested struct definition for eager_pages. Either
> define the struct separately, give it a name, and then refer to that
> name here, or just define the elements individually, and maybe give
> them a common prefix or something. I don't think what you have here is
> a style we normally use. As you can see, pgindent is not particularly
> fond of it. It seems particularly weird given that you didn't even put
> all the stuff related to eagerness inside of it.

I don't mind changing it. This version has them prefixed with "eager" instead.

> - It's a little weird that you mostly treat eager vacuums as an
> intermediate position between aggressive and normal, but then we
> decide whether we're eager in a different place than we decide whether
> we're aggressive.

Yes, originally I had the code in heap_vacuum_set_up_eagerness() in
vacuum_get_cutoffs() but I moved it after noticing
vacuum_get_cutoffs() was in vacuum.c which is technically AM-agnostic.
I didn't want to pollute it too much with the eagerness logic which is
actually used in heap-specific code.

With the other changes to eliminate the idea of a separate eager
vacuum in this version, I think this is no longer an issue.

> - On a related note, won't most vacuums be VAC_EAGER rather than
> VAC_NORMAL, thus making VAC_NORMAL a misnomer? I wonder if it's better
> to merge eager and normal together, and just treat the cases where we
> judge eager scanning not worthwhile as a mild special case of an
> otherwise-normal vacuum. It's important that a user can tell what
> happened from the log message, but it doesn't seem absolutely
> necessary for the start-of-vacuum message to make that clear. It could
> just be that %u all-visible scanned => 0 all-visible scanned means we
> didn't end up being at all eager.

I can see this. In this version, I've eliminated the concept of eager
vacuums and reverted the LVRelState flag back to a boolean indicating
aggressive or non-aggressive. Normal vacuums may eager scan some pages
if they qualify. If so, the eager scan management state is set up in
the LVRelState. Aggressive vacuums and normal vacuums with eager
scanning disabled have all of this state set to values indicating
eager scanning is disabled.

> - Perhaps it's unfair of me, but I think I would have hoped for an
> acknowledgement in this commit message, considering that I believe I
> was the one who suggested breaking the relation into logical regions,
> trying to freeze a percentage of the all-visible-but-not-all-frozen
> pages, and capping both successes and failures. Starting the region at
> a random offset wasn't my idea, and the specific thresholds you've
> chosen were not the ones I suggested, and limiting successes globally
> rather than per-region was not what I think I had in mind, and I don't
> mean to take away from everything you've done to move this forward,
> but unless I am misunderstanding the situation, this particular patch
> (0007) is basically an implementation of an algorithm that, as far as
> I know, I was the first to propose.

Ah, you're right. This was an oversight that I believe I've corrected
in the attached version's commit message.

> - Which of course also means that I tend to like the idea, but also
> that I'm biased. Still, what is the reasonable alternative to this
> patch? I have a hard time believing that it's "do nothing". As far as
> I know, pretty much everyone agrees that the large burst of work that
> tends to occur when an aggressive vacuum kicks off is extremely
> problematic, particularly but not only the first time it kicks off on
> a table or group of tables that may have accumulated many
> all-visible-but-not-all-frozen pages. This patch might conceivably err
> in moving that work too aggressively to earlier vacuums, thus making
> those vacuums too expensive or wasting work if the pages end up being
> modified again; or it might conceivably err in moving work
> insufficiently aggressively to earlier vacuums, leaving too much
> remaining work when the aggressive vacuum finally happens. In fact, I
> would be surprised if it doesn't have those problems in some
> situations. But it could have moderately severe cases of those
> problems and still be quite a bit better than what we have now
> overall.
>
> - So,I think we should avoid fine-tuning this and try to understand if
> there's anything wrong with the big picture. Can we imagine a user who
> is systematically unhappy with this change? Like, not a user who got
> hosed once because of some bad luck, but someone who is constantly and
> predictably getting hosed? They'd need to be accumulating lots of
> all-visible-not-all-frozen pages in relatively large tables on a
> regular basis, but then I guess they'd need to either drop the tables
> before the aggressive vacuum happened, or they'd need to render the
> pages not-all-visible again before the aggressive vacuum would have
> happened. I'm not entirely sure how possible that is. My best guess is
> that it's possible if the timing of your autovacuum runs is
> particularly poor -- you just need to load some data, vacuum it early
> enough that the XIDs are still young so it doesn't get frozen, then
> have the eager vacuum hit it, and then update it. That doesn't seem
> impossible, but I'm not sure if it's possible to make it happen often
> enough and severely enough to really cause a problem. And I'm not sure
> we're going to find that out before this is committed

I suppose in the worst case, if the timings all align poorly and
you've set your autovacuum_freeze_max_age/vacuum_freeze_table_age very
high and vacuum_freeze_min_age very low, you could end up uselessly
freezing the same page multiple times before aggressive vacuuming.

If you cycle through modifying a page, vacuuming it, setting it
all-visible, and eagerly scanning and freezing it multiple times
before an aggressive vacuum, this would be a lot of extra useless
freezing. It seems difficult to do because the page will likely be
frozen the first time you vacuum it if vacuum_freeze_min_age is set
sufficiently low.

The other "worst case" is just that you always scan and fail to freeze
an extra 3% of the relation while vacuuming the table. This one is
much easier to achieve. As such, it seems worthwhile to add a GUC and
table option to tune the EAGER_SCAN_MAX_FAILS_PER_REGION such that you
can disable eager scanning altogether (or increase or decrease how
aggressive it is).

- Melanie

Attachment Content-Type Size
v4-0001-Add-more-general-summary-to-vacuumlazy.c.patch text/x-patch 3.6 KB
v4-0002-Eagerly-scan-all-visible-pages-to-amortize-aggres.patch text/x-patch 24.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-12-23 17:58:09 Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Previous Message Jelte Fennema-Nio 2024-12-23 17:46:16 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs