Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2025-01-23 19:22:25
Message-ID: CA+TgmoaT0x70pu8WQ75p8k9QMZ9NzZQhNGvG4J1zbHDA5gvZ1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2025 at 1:31 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Thanks! Attached v9 incorporates all your suggested changes.
>
> I'm not exactly sure what to do about it, but I feel like the
> documentation of vacuum_eager_scan_max_fails is going to be
> incomprehensible to someone who doesn't already have a deep knowledge
> of how this patch works.

[ hit "Send" WAY too early ]

There's a lot of magic numbers here: 4096 blocks per region, 128 fails
per region, some other number of successes per region. I don't know
why this particular one is configurable and none of the others are.
I'm not saying it's the wrong decision, but it's not clear to me what
the reasoning is.

Also, the units of this parameter are pretty terrible. Like, it could
be measured in some unit that involves bytes or kilobytes or
megabytes, or, maybe better, it could be measured as a percentage.
Instead of either of those things, it's an integer that only makes
sense in reference to another (hard-coded, magical) integer. If you
made this a percentage, then you wouldn't necessarily even have to
tell people that the magic internal number is 4096. Or at least that
knowledge would be a lot less critical.

+ <command>VACUUM</command> typically scans pages that have been
+ modified since the last vacuum. While some all-visible but not
+ all-frozen pages are eagerly scanned to try and freeze them, the
+ <structfield>relfrozenxid</structfield> can only be advanced when
+ every page of the table that might contain unfrozen XIDs is scanned.

The phrase "While some all-visible but not all-frozen pages are
eagerly scanned to try and freeze them" seems odd here, because what
follows is true either way. This is like saying "When it's Tuesday, I
get hungry if I don't eat any food."

+ /*
+ * A normal vacuum that has failed to freeze too many eagerly scanned
+ * blocks in a row suspends eager scanning. next_eager_scan_region_start
+ * is the block number of the first block eligible for resumed eager
+ * scanning.
+ *
+ * When eager scanning is permanently disabled, either initially
+ * (including for aggressive vacuum) or due to hitting the success limit,
+ * this is set to InvalidBlockNumber.
+ */

This and the other comments for the new LVRelState members are quite
long, but I really like them. Arguably we ought to have more of this
kind of thing.

+ /*
+ * We only want to enable eager scanning if we are likely to be able to
+ * freeze some of the pages in the relation. We can freeze tuples older
+ * than the visibility horizon calculated at the beginning of vacuum, but
+ * we are only guaranteed to freeze them if at least one tuple on the page
+ * precedes the freeze limit or multixact cutoff (calculated from
+ * vacuum_[multixact_]freeze_min_age). So, if the oldest unfrozen xid
+ * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we aren't
+ * likely to freeze many tuples.
+ */

I think this could be clearer. And I wonder if "We can freeze tuples
older" is actually a typo for "We can freeze tuples newer", because
otherwise I can't make sense of the rest of the sentence. Right now it
seems to boil down to something like: We can freeze tuples older...but
we are only guaranteed to freeze them if they are older.

Maybe what we want to say here is something like: We only want to
enable eager scanning if we are likely to be able to freeze some of
the pages in the relation. If FreezeLimit has not advanced past
relfrozenxid, or if MultiXactCutoff has not advanced passed
relminmxid, then there's a good chance we won't be able to freeze
anything more than we already have. Success would not be impossible,
because there may be pages we didn't try to freeze previously, and
it's also possible that some XID greater than FreezeLimit has ended,
allowing for freezing. But as a heuristic, we wait until the
FreezeLimit advances, so that we won't repeatedly attempt eager
freezing while the same long-running transaction blocks progress every
time.

+ ereport(INFO,
+ (errmsg("Vacuum successfully froze %u eager scanned blocks of
\"%s.%s.%s\". Now disabling eager scanning.",

I predict that if Tom sees this, you will get complaints about both
the wording of the message, which pretty clearly does not conform to
style guidelines for a primary error message, and also about the use
of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
eager scanning after freezing %u eagerly scanned blocks".

+ /*
+ * Aggressive vacuums cannot skip all-visible pages that are not also
+ * all-frozen. Normal vacuums with eager scanning enabled only skip
+ * such pages if they have hit the failure limit for the current eager
+ * scan region.
+ */
+ if (vacrel->aggressive ||
+ vacrel->eager_scan_remaining_fails > 0)
+ {
+ if (!vacrel->aggressive)
+ next_unskippable_eager_scanned = true;
+ break;
}

I think this would be clearer as two separate if statements. if
(aggressive) break; if (vacrel->eager_scan_remaining_fails) {
next_unskippable_eager_scanned = true; break; }

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-01-23 19:28:58 Re: Add CASEFOLD() function.
Previous Message Masahiko Sawada 2025-01-23 19:07:41 Re: Skip collecting decoded changes of already-aborted transactions