Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2025-01-22 22:48:46
Message-ID: CAAKRu_YuEfvNGgMaJrSi9DCadsvAL6h6w9b7Sk+Z5Kmy5=HzYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 18, 2025 at 12:10 PM Robert Treat <rob(at)xzilla(dot)net> wrote:
>
> Hey Melanie, took a walk through this version, some minor thoughts below.

Thanks! Attached v9 incorporates all your suggested changes.

> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -9128,9 +9128,10 @@ COPY postgres_log FROM
> '/full/path/to/logfile.csv' WITH csv;
> <command>VACUUM</command> may scan and fail to set all-frozen in the
> visibility map before disabling eager scanning until the next region
> (currently 4096 blocks) of the relation. A value of 0 disables eager
> - scanning altogether. The default is 128. This parameter can be set in
> - postgresql.conf or on the server command line but is overridden for
> - individual tables by changing the
> + scanning altogether. The default is 128. This parameter can only be set
> + in the <filename>postgresql.conf</filename> file or on the server
> + command line; but the setting can be overridden for individual tables
> + by changing the
> <link linkend="reloption-vacuum-eager-scan-max-fails">corresponding
> table storage parameter</link>.
> + For more information see <xref linkend="vacuum-for-wraparound"/>.
> </para>
> </listitem>
>
> The above aligns with the boiler plate text we usually use for these
> types of settings, though typically we just link to the table
> parameter section, but I left the direct link you included, since it
> seems like a nice addition (I'm contemplating doing that for all such
> params, but that should be done separately)

I've made these suggested changes.

> As a complete aside, I think we should re-order these sections to have
> freezing first, then cost delay, then autovac stuff, since I feel like
> most people learn about vacuum first, then build on that with
> autovacuum, not to mention the general priority of managing
> wraparound. Granted, that's not directly germane to eager scanning.

I could see that making sense.

> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -638,9 +638,9 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
> </tip>
>
> <para>
> - <command>VACUUM</command> mostly scans pages that have been modified since
> - the last vacuum. Some all-visible but not all-frozen pages are eagerly
> - scanned to try and freeze them. But the
> + <command>VACUUM</command> typically scans pages that have been
> modified since
> + the last vacuum. While some all-visible but not all-frozen pages
> are eagerly
> + scanned to try and freeze them, the
> <structfield>relfrozenxid</structfield> can only be advanced when every
> page of the table that might contain unfrozen XIDs is scanned. This
>
> above is an attempt to make this wording less awkward.

Thanks! I've adopted this wording.

> wrt this portion of src/backend/access/heap/vacuumlazy.c
> + * pages at the beginning of the vacuum. Once the success cap has been hit,
> + * eager scanning is permanently disabled.
> + *
> Maybe this is obvious enough to the reader, but should we change this
> to something like "eager scanning is disabled for the remainder of
> this vacuum" or similar? I guess I'm trying to make clear that it
> isn't disabled "permanently" or until an aggressive vacuum run
> completes or a vacuum freeze or some other scenario; we can/will eager
> scan again essentially immediately if we just run vacuum again. (It
> seems obvious in the actual code, just less so in the context of the
> introductory wording)

Good point. I've changed this.

> And one last bit of overthinking... in src/backend/access/heap/vacuumlazy.c
> + if (vacrel->rel_pages < VACUUM_EAGER_SCAN_REGION_SIZE)
> + return;
> It's easy to agree that anything less than the region size doesn't
> make sense to eager scan, but I wondered about the "region_size +1"
> scenario; essentially cases where we are not very much larger in total
> than a single region, where it also feels like there isn't much gain
> from eager scanning. Perhaps we should wait until 2x region size, in
> which case we'd at least start in a scenario where the bucketing is
> more equal?

Good idea. Because we start somewhere randomly in the first region,
the whole first region isn't completely subject to the eager scan
algorithm anyway. I've changed it to 2x the region size.

Circling back to benchmarking, I've been running the most adversarial
benchmarks I could devise and can share a bit of what I've found.

I created a "hot tail" benchmark where 16 clients insert some data and
then update some data older than what they just inserted but still
towards the end of the relation. The adversarial part is that I bulk
delete all the data older than X hours where X hours is always after
the data is eligible to be frozen but before it would be aggressively
vacuumed.

That means that there are a bunch of pages that will never be frozen
on master but are frozen with the patch -- wasting vacuum resources. I
tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the
DELETE window to specifically have this behavior.

With my patch, I do see a 15-20% increase in the total time spent
vacuuming over the course of the multi-hour benchmark. (I only see a
1% increase in the total WAL volume, though.)

Interestingly, I see an improvement to the bulk delete performance.
The deletes are much faster with the patch -- in fact DELETE p99
latency improves by over 30%. And, looking at pg_stat_io, it seems
that this must be due to far fewer reads by the delete (20% less time
spent in client backend bulkreads). I imagine this is because vacuum
has more recently read in those pages, so the DELETE finds them in the
kernel buffer cache.

Some of this is down to timing that varies run-to-run of the
benchmark. These numbers vary a bit depending on exactly when DELETEs
started, when checkpoints happen, and what blocks were updated (the
UPDATE uses a bit of randomness to decide what to update). And, of
course, on different machines with different amounts of memory, the
performance boost to the DELETEs is likely to disappear. So, we have
to assume that the extra time spent vacuuming comes with no benefit to
offset the cost in the worst case.

The question is, is this extra time spent vacuuming in the worst case
acceptable?

- Melanie

Attachment Content-Type Size
v9-0001-Eagerly-scan-all-visible-pages-to-amortize-aggres.patch text/x-patch 40.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-01-22 22:49:45 Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Previous Message Jeff Davis 2025-01-22 22:42:54 Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators