From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Floris Van Nee <florisvannee(at)optiver(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: visibility map corruption |
Date: | 2021-07-24 00:08:52 |
Message-ID: | 20210724000852.GD8025@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 8, 2021 at 09:51:47AM -0400, Bruce Momjian wrote:
> On Thu, Jul 8, 2021 at 08:11:14AM -0500, Justin Pryzby wrote:
> > Also, the pg_upgrade status message still seems to be misplaced:
> >
> > In 20210706190612(dot)GM22043(at)telsasoft(dot)com, Justin Pryzby wrote:
> > > I re-arranged the pg_upgrade output of that patch: it was in the middle of the
> > > two halves: "Setting next transaction ID and epoch for new cluster"
> >
> > +++ b/src/bin/pg_upgrade/pg_upgrade.c
> > @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
> > "\"%s/pg_resetwal\" -f -x %u \"%s\"",
> > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
> > new_cluster.pgdata);
> > + check_ok();
> > + prep_status("Setting oldest XID for new cluster");
> > + exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> > + "\"%s/pg_resetwal\" -f -u %u \"%s\"",
> > + new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
> > + new_cluster.pgdata);
> > exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> > "\"%s/pg_resetwal\" -f -e %u \"%s\"",
> > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
>
> Wow, you are 100% correct. Updated patch attached.
OK, I have the patch ready to apply to all supported Postgres versions,
and it passes all my cross-version pg_upgrade tests.
However, I am now stuck on the commit message text, and I think this is
the point Peter Geoghegan was trying to make earlier --- while we know
that preserving the oldest xid in pg_control is the right thing to do,
and that setting it to the current xid - 2 billion (the old behavior)
causes vacuum freeze to run on all tables, but what else does this patch
affect?
As far as I know, seeing a very low oldest xid causes autovacuum to
check all objects and make sure their relfrozenxid is less then
autovacuum_freeze_max_age, but isn't that just a check? Would that
cause any table scans? I would think not. And would this cause
incorrect truncation of pg_xact or fsm or vm files? I would think not
too.
Even if the old and new cluster had mismatched autovacuum_freeze_max_age
values, I don't see how that would cause any corruption either.
I could perhaps see corruption happening if pg_control's oldest xid
value was closer to the current xid value than it should be, but I can't
see how having it 2-billion away could cause harm, unless perhaps
pg_upgrade itself used enough xids to cause the counter to wrap more
than 2^31 away from the oldest xid recorded in pg_control.
What I am basically asking is how to document this and what it fixes.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-07-24 00:47:18 | Re: visibility map corruption |
Previous Message | Tom Lane | 2021-07-23 22:18:06 | Re: Followup Timestamp to timestamp with TZ conversion |