From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Jason Harvey <jason(at)reddit(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_upgrade can result in early wraparound on databases with high transaction load |
Date: | 2021-04-23 23:42:56 |
Message-ID: | 20210423234256.hwopuftipdmp3okf@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hi,
On 2019-06-15 11:37:59 -0700, Noah Misch wrote:
> On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason(at)reddit(dot)com> wrote:
> > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:
> > >
> > > UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0
>
> That's bad.
Yea. The code triggering it in pg_resetwal is bogus as far as I can
tell. That pg_upgrade triggers it makes this quite bad.
I just hit issues related to it when writing a wraparound handling
test. Peter remembered this issue (how?)...
Especially before 13 (inserts triggering autovacuum) it is quite common
to have tables that only ever get vacuumed due to anti-wraparound
vacuums. And it's common for larger databases to increase
autovacuum_freeze_max_age. Which makes it fairly likely for this to
guess an oldestXid value that's *newer* than an accurate one. Since
oldestXid is used in a few important-ish places (like triggering
vacuums, and in 14 also some snapshot related logic) I think that's bad.
The relevant code:
if (set_xid != 0)
{
ControlFile.checkPointCopy.nextXid =
FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
set_xid);
/*
* For the moment, just set oldestXid to a value that will force
* immediate autovacuum-for-wraparound. It's not clear whether adding
* user control of this is useful, so let's just do something that's
* reasonably safe. The magic constant here corresponds to the
* maximum allowed value of autovacuum_freeze_max_age.
*/
ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
}
Originally from:
commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2009-08-31 02:23:23 +0000
Track the current XID wrap limit (or more accurately, the oldest unfrozen
XID) in checkpoint records. This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist. It should also simplify life for
hot-standby operation.
I think we should remove the oldestXid guessing logic, and expose it as
an explicit option. I think it's important that pg_upgrade sets an
accurate value. Probably not worth caring about oldestXidDB though?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-04-24 00:06:17 | Re: BRIN index on timestamptz |
Previous Message | Tom Lane | 2021-04-23 23:12:41 | Re: server process exited with code 1 |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-04-24 00:28:27 | Re: pg_upgrade can result in early wraparound on databases with high transaction load |
Previous Message | Alvaro Herrera | 2021-04-23 23:31:44 | Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY |