Re: Add 64-bit XIDs into PostgreSQL 15

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)orioledata(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Chris Travers <chris(dot)travers(at)gmail(dot)com>, Fedor Sigaev <teodor(at)sigaev(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Date: 2022-11-28 21:09:28
Message-ID: CAH2-Wz=CTJ5gBzv0cAS3WTxSf7vwxE0vFwwzSoGdsVhex26-3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 28, 2022 at 8:53 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It is true that if the table is progressively bloating, it is likely
> to be more bloated by the time you are 8 billion XIDs behind than it
> was when you were 800 million XIDs behind. I don't see that as a very
> good reason not to adopt this patch, because you can bloat the table
> by an arbitrarily large amount while consuming only a small number of
> XiDs, even just 1 XID. Protecting against bloat is good, but shutting
> down the database when the XID age reaches a certain value is not a
> particularly effective way of doing that, so saying that we'll be
> hurting people by not shutting down the database at the point where we
> do so today doesn't ring true to me.

I can't speak for Chris, but I think that almost everybody will agree
on this much, without really having to think about it. It's easy to
see that having more XID space is, in general, strictly a good thing.
If there was a low risk way of getting that benefit, then I'd be in
favor of it.

Here's the problem that I see with this patch: I don't think that the
risks are commensurate with the benefits. I can imagine being in favor
of an even more invasive patch that (say) totally removes the concept
of freezing, but that would have to be a very different sort of
design.

> Philosophically, I disagree with the idea of shutting down the
> database completely in any situation in which a reasonable alternative
> exists. Losing read and write availability is really bad, and I don't
> think it's what users want.

At a certain point it may make more sense to activate XidStopLimit
protections (which will only prevent new XID allocations) instead of
getting further behind on freezing, even in a world where we're never
strictly obligated to activate XidStopLimit. It may in fact be the
lesser evil, even with 64-bit XIDs -- because we still have to freeze,
and the factors that drive when and how we freeze mostly aren't
changed.

Fundamentally, when we're falling behind on freezing, at a certain
point we can expect to keep falling behind -- unless some kind of
major shift happens. That's just how freezing works, with or without
64-bit XIDs/MXIDs. If VACUUM isn't keeping up with the allocation of
transactions, then the system is probably misconfigured in some way.
We should do our best to signal this as early and as frequently as
possible, and we should mitigate specific hazards (e.g. old temp
tables) if at all possible. We should activate the failsafe when
things really start to look dicey (which, incidentally, the patch just
removes). These mitigations may be very effective, but in the final
analysis they don't address the fundamental weakness in freezing.

Granted, the specifics of the current XidStopLimit mechanism are
unlikely to directly carry over to 64-bit XIDs. XidStopLimit is
structured in a way that doesn't actually consider freeze debt in
units like unfrozen pages. Like Chris, I just don't see why the patch
obviates the need for something like XidStopLimit, since the patch
doesn't remove freezing. An improved XidStopLimit mechanism might even
end up kicking in *before* the oldest relfrozenxid reached 2 billion
XIDs, depending on the specifics.

Removing the failsafe mechanism seems misguided to me for similar
reasons. I recently learned that Amazon RDS has set a *lower*
vacuum_failsafe_age default than the standard default (its default of
1.6 billion to only 1.2 billion on RDS). This decision predates my
joining AWS. It seems as if practical experience has shown that
allowing any table's age(relfrozenxid) to get too far past a billion
is not a good idea. At least it's not a good idea on modern Postgres
versions, that have the freeze map.

We really shouldn't have to rely on having billions of XIDs available
in the first place -- XID space isn't really a fungible commodity.
It's much more important to recognize that something (some specific
thing) just isn't working as designed, which in general could be
pretty far removed from freezing. For example, index corruption could
do it (at least without the failsafe). Some kind of autovacuum
starvation could do it. It's almost always more complicated than "not
enough available XID space".

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-11-28 21:10:01 Re: Bug in wait time when waiting on nested subtransaction
Previous Message Andres Freund 2022-11-28 21:09:08 Re: Failed Assert in pgstat_assoc_relation