Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-01-21 20:42:23
Message-ID: CAH2-WzkYV+YYne3WRdWT64RyXG_1ckSgx9w8P3N1a1HK93r-zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 21, 2022 at 12:07 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> This confuses me. "Transactions per second" is a headline database
> metric that lots of users actually focus on quite heavily -- rather
> too heavily imho.

But transactions per second is for the whole database, not for
individual tables. It's also really a benchmarking thing, where the
size and variety of transactions is fixed. With something like pgbench
it actually is exactly the same thing, but such a workload is not at
all realistic. Even BenchmarkSQL/TPC-C isn't like that, despite the
fact that it is a fairly synthetic workload (it's just not super
synthetic).

> Ok, XID consumption is only a subset of transactions
> that are not read-only but that's a detail that's pretty easy to
> explain and users get pretty quickly.

My point was mostly this: the number of distinct extant unfrozen tuple
headers (and the range of the relevant XIDs) is generally highly
unpredictable today. And the number of tuples we'll have to freeze to
be able to advance relfrozenxid by a good amount is quite variable, in
general.

For example, if we bulk extend a relation as part of an ETL process,
then the number of distinct XIDs could be as low as 1, even though we
can expect a great deal of "freeze debt" that will have to be paid off
at some point (with the current design, in the common case where the
user doesn't account for this effect because they're not already an
expert). There are other common cases that are not quite as extreme as
that, that still have the same effect -- even an expert will find it
hard or impossible to tune autovacuum_freeze_min_age for that.

Another case of interest (that illustrates the general principle) is
something like pgbench_tellers. We'll never have an aggressive VACUUM
of the table with the patch, and we shouldn't ever need to freeze any
tuples. But, owing to workload characteristics, we'll constantly be
able to keep its relfrozenxid very current, because (even if we
introduce skew) each individual row cannot go very long without being
updated, allowing old XIDs to age out that way.

There is also an interesting middle ground, where you get a mixture of
both tendencies due to skew. The tuple that's most likely to get
updated was the one that was just updated. How are you as a DBA ever
supposed to tune autovacuum_freeze_min_age if tuples happen to be
qualitatively different in this way?

> What I find confuses people much more is the concept of the
> oldestxmin. I think most of the autovacuum problems I've seen come
> from cases where autovacuum is happily kicking off useless vacuums
> because the oldestxmin hasn't actually advanced enough for them to do
> any useful work.

As it happens, the proposed log output won't use the term oldestxmin
anymore -- I think that it makes sense to rename it to "removable
cutoff". Here's an example:

LOG: automatic vacuum of table "regression.public.bmsql_oorder": index scans: 1
pages: 0 removed, 317308 remain, 250258 skipped using visibility map
(78.87% of total)
tuples: 70 removed, 34105925 remain (6830471 newly frozen), 2528 are
dead but not yet removable
removable cutoff: 37574752, which is 230115 xids behind next
new relfrozenxid: 35221275, which is 5219310 xids ahead of previous value
index scan needed: 55540 pages from table (17.50% of total) had
3339809 dead item identifiers removed
index "bmsql_oorder_pkey": pages: 144257 in total, 0 newly deleted, 0
currently deleted, 0 reusable
index "bmsql_oorder_idx2": pages: 330083 in total, 0 newly deleted, 0
currently deleted, 0 reusable
I/O timings: read: 7928.207 ms, write: 1386.662 ms
avg read rate: 33.107 MB/s, avg write rate: 26.218 MB/s
buffer usage: 220825 hits, 443331 misses, 351084 dirtied
WAL usage: 576110 records, 364797 full page images, 2046767817 bytes
system usage: CPU: user: 10.62 s, system: 7.56 s, elapsed: 104.61 s

Note also that I deliberately made the "new relfrozenxid" line that
immediately follows (information that we haven't shown before now)
similar, to highlight that they're now closely related concepts. Now
if you VACUUM a table that is either empty or has only frozen tuples,
VACUUM will set relfrozenxid to oldestxmin/removable cutoff.
Internally, oldestxmin is the "starting point" for our final/target
relfrozenxid for the table. We ratchet it back dynamically, whenever
we see an older-than-current-target XID that cannot be immediately
frozen (e.g., when we can't easily get a cleanup lock on the page).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2022-01-21 20:53:35 Re: How to get started with contribution
Previous Message Tomas Vondra 2022-01-21 20:41:55 Re: How to get started with contribution