Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Date: 2014-07-20 22:30:10
Message-ID: 20140720223010.GF5974@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2014-07-20 18:16:51 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-07-20 17:43:04 -0400, Tom Lane wrote:
> >> No, I don't think so. Truncation is driven off oldestMultiXid from
> >> pg_control, not from relminmxid. The only thing in-the-future values of
> >> those will do to us is prevent autovacuum from thinking it must do a full
> >> table scan. (In particular, in-the-future values do not cause
> >> oldestMultiXid to get advanced, because we're always looking for the
> >> oldest value not the newest.)
>
> > Right. But that's the problem. If oldestMulti is set to, say, 3000000000
> > by pg_resetxlog during pg_upgrade but *minmxid = 1 those tables won't be
> > full tables scanned because of multixacts. But vac_truncate_clog() will
> > SetMultiXactIdLimit(minMulti, minmulti_datoid);
> > regardless.
>
> > Note that it'll not notice the limit of other databases in this case
> > because vac_truncate_clog() will effectively use the in memory
> > GetOldestMultiXactId() and check if other databases are before that. But
> > there won't be any because they all appear in the future. Due to that
> > the next checkpoint will tru6ncate the clog to the cutoff multi xid used
> > by the last vacuum.
>
> Right.
>
> > Am I missing something?
>
> My point is that the cutoff multi xid won't be new enough to remove
> non-LOCKED_ONLY (ie, post-9.3) mxids.

Why not? Afaics this will continue to happen until multixacts are
wrapped around once? So the cutoff multi will be new enough for that at
some point after the pg_upgrade?

Luckily in most cases full table vacuums triggered due to normal xids
will prevent bad problems though. There have been a couple reports where
people included pg_controldata output indicating crazy rates of multixid
consumption but I think none of those were crazy enough to burn multis
so fast that important ones get truncated before a full table vacuum
occurs due to normal xids.

> >> But in any case, we both agree that setting relminmxid to equal nextMulti
> >> is completely unsafe in a 9.3 cluster that's already been up. So the
> >> proposed fix instructions are certainly wrong.
>
> > Right. I'm pondering what to do about it instead. The best idea I have
> > is something like:
> > 1) Jot down pg_controldata|grep NextMultiXactId
> > 2) kill/wait for all existing transactions to end
> > 3) vacuum all databases with vacuum_multixact_freeze_min_age=0. That'll
> > get rid of all old appearing multis
> > 4) Update pg_class to set relminmxid=value from 1), same with
> > pg_database
>
> > But that sucks and doesn't deal with all the problems :(
>
> Yeah. At this point I'm of the opinion that we should not recommend any
> manual corrective actions for this issue. They're likely to do more harm
> than good, especially if the user misses or fat-fingers any steps.

I don't really see us coming up with something robust in time :/. It's a
bid sad, but maybe we should recommend contacting the mailing list if
pg_upgrade has been used and nextMulti is above 2^31?

Btw, we really should have txid_current() equivalent for multis...

> I'm also thinking that the lack of any complaints suggests there are few
> or no existing installations with nextMulti past 2^31, anyhow. If it were
> even past 400000000 (default autovacuum_multixact_freeze_max_age), we'd
> have been hearing howls of anguish about full-database freezing scans
> occurring immediately after a pg_upgrade (thanks to minmxid = 1 being old
> enough to trigger that).

I think people just chalk that up to 'crazy pg vacuuming behaviour' and
not investigate further. At least that's my practical experience :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-07-20 22:39:06 Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Previous Message Tom Lane 2014-07-20 22:16:51 Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts