Quick Links

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From:	Robert Treat <rob(at)xzilla(dot)net>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject:	Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date:	2025-01-18 17:10:23
Message-ID:	CABV9wwPSqfPmL9tEgiNejsgsdE-kHvQjYseqHOtEgY-RL-PhDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 15, 2025 at 3:56 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> On Wed, Jan 15, 2025 at 3:03 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 15, 2025 at 2:36 PM Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> wrote:
> > >>
> > > There is a typo on committed vacuumlazy.c file
> > > * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
> > > * phases II and II, cleaning up the dead tuples referenced in the current TID
> > > * store. This empties the TID store resumes phase I.
> > >
> > > Probably you meant phases II and III
> >
> > Oops. Thanks for catching this. I'll fix that and the following
> > sentence in the main patch proposed in this thread to avoid extra
> > noise.
>
> Attached v8 is rebased and has various comment cleanups.
>

Hey Melanie, took a walk through this version, some minor thoughts below.

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9128,9 +9128,10 @@ COPY postgres_log FROM
'/full/path/to/logfile.csv' WITH csv;
<command>VACUUM</command> may scan and fail to set all-frozen in the
visibility map before disabling eager scanning until the next region
(currently 4096 blocks) of the relation. A value of 0 disables eager
- scanning altogether. The default is 128. This parameter can be set in
- postgresql.conf or on the server command line but is overridden for
- individual tables by changing the
+ scanning altogether. The default is 128. This parameter can only be set
+ in the <filename>postgresql.conf</filename> file or on the server
+ command line; but the setting can be overridden for individual tables
+ by changing the
<link linkend="reloption-vacuum-eager-scan-max-fails">corresponding
table storage parameter</link>.
+ For more information see <xref linkend="vacuum-for-wraparound"/>.
</para>
</listitem>

The above aligns with the boiler plate text we usually use for these
types of settings, though typically we just link to the table
parameter section, but I left the direct link you included, since it
seems like a nice addition (I'm contemplating doing that for all such
params, but that should be done separately)

As a complete aside, I think we should re-order these sections to have
freezing first, then cost delay, then autovac stuff, since I feel like
most people learn about vacuum first, then build on that with
autovacuum, not to mention the general priority of managing
wraparound. Granted, that's not directly germane to eager scanning.

--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -638,9 +638,9 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
</tip>

<para>
- <command>VACUUM</command> mostly scans pages that have been modified since
- the last vacuum. Some all-visible but not all-frozen pages are eagerly
- scanned to try and freeze them. But the
+ <command>VACUUM</command> typically scans pages that have been
modified since
+ the last vacuum. While some all-visible but not all-frozen pages
are eagerly
+ scanned to try and freeze them, the
<structfield>relfrozenxid</structfield> can only be advanced when every
page of the table that might contain unfrozen XIDs is scanned. This

above is an attempt to make this wording less awkward.

wrt this portion of src/backend/access/heap/vacuumlazy.c
+ * pages at the beginning of the vacuum. Once the success cap has been hit,
+ * eager scanning is permanently disabled.
+ *
Maybe this is obvious enough to the reader, but should we change this
to something like "eager scanning is disabled for the remainder of
this vacuum" or similar? I guess I'm trying to make clear that it
isn't disabled "permanently" or until an aggressive vacuum run
completes or a vacuum freeze or some other scenario; we can/will eager
scan again essentially immediately if we just run vacuum again. (It
seems obvious in the actual code, just less so in the context of the
introductory wording)

And one last bit of overthinking... in src/backend/access/heap/vacuumlazy.c
+ if (vacrel->rel_pages < VACUUM_EAGER_SCAN_REGION_SIZE)
+ return;
It's easy to agree that anything less than the region size doesn't
make sense to eager scan, but I wondered about the "region_size +1"
scenario; essentially cases where we are not very much larger in total
than a single region, where it also feels like there isn't much gain
from eager scanning. Perhaps we should wait until 2x region size, in
which case we'd at least start in a scenario where the bucketing is
more equal?

Robert Treat
https://xzilla.net

In response to

Re: Eagerly scan all-visible pages to amortize aggressive vacuum at 2025-01-15 20:55:52 from Melanie Plageman

Responses

Re: Eagerly scan all-visible pages to amortize aggressive vacuum at 2025-01-22 22:48:46 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joe Conway	2025-01-18 17:32:10	Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Previous Message	jian he	2025-01-18 17:00:04	Re: Statistics Import and Export