Re: pg15b2: large objects lost on upgrade

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shruthi Gowda <gowdashru(at)gmail(dot)com>
Subject: Re: pg15b2: large objects lost on upgrade
Date: 2022-08-03 19:41:45
Message-ID: CAH2-Wzk8pi5BeejKNR09rXnpWbCZmFzxJpRraRq8xCwNGCkVcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 3, 2022 at 6:59 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

If that kind of speculative bug existed, and somehow triggered before
the concurrent autovacuum ran (which seems very likely to be the
source of the test flappiness), then it would still be caught, most
likely. VACUUM itself has the following defenses:

* The defensive "can't happen" errors added to
heap_prepare_freeze_tuple() and related freezing routines by commit
699bf7d0 in 2017, as hardening following the "freeze the dead" bug.
That'll catch XIDs that are before the relfrozenxid at the start of
the VACUUM (ditto for MXIDs/relminmxid).

* The assertion added in my recent commit 0b018fab, which verifies
that we're about to set relfrozenxid to something sane.

* VACUUM now warns when it sees a *previous* relfrozenxid that's
apparently "in the future", following recent commit e83ebfe6. This
problem scenario is associated with several historic bugs in
pg_upgrade, where for one reason or another it failed to carry forward
correct relfrozenxid and/or relminmxid values for a table (see the
commit message for references to those old pg_upgrade bugs).

It might make sense to run a manual VACUUM right at the end of the
test, so that you reliably get this kind of coverage, even without
autovacuum.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-03 19:46:45 Re: [PATCH] CF app: add "Returned: Needs more interest"
Previous Message Robert Haas 2022-08-03 19:36:40 Re: Smoothing the subtrans performance catastrophe