From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date: | 2015-05-08 06:25:39 |
Message-ID: | CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, May 7, 2015 at 7:58 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Now I
>> understand the suggestion that the checkpoint code could be in charge
>> of advancing the oldest multixact + offset.
>
> Yeah. I think we need to pursue that angle unless somebody has a better idea.
So here's a patch that does that. It turns out to be pretty simple: I
just moved the DetermineSafeOldestOffset() calls around. As far as I
can see, and that may not be far enough at this hour of the morning,
we just need two: one in StartupMultiXact, so that we initialize the
values correctly after reading the control file; and then another at
the very end of TruncateMultiXact, so we update it after each
checkpoint or restartpoint. This leaves the residual problem that
autovacuum doesn't directly advance the stop point - the following
checkpoint does. We could handle that by requesting a checkpoint if
oldestMultiXactId is ahead of lastCheckpointedOldest by enough that a
checkpoint would free up some space, although one might hope that
people won't be living on the edge to quite that degree.
As things are, I believe this sequence is possible:
1. The members SLRU is full all the way up to offsetStopLimit.
2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which
sets lastCheckpointedOldest.
3. Vacuum runs, calling SetMultiXactIdLimit(), calling
DetermineSafeOldestOffset(), advancing
MultiXactState->offsetStopLimit.
4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible
for someone to consume an MXID greater than offsetStopLimit, making
MultiXactState->nextOffset > lastCheckpointedOffset
5. The checkpoint from step 1, continuing on its merry way, now calls
TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away
nearly every file in the SLRU.
I haven't confirmed this yet, so I might still be all wet, especially
since it is late here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
multixact-truncate-race.patch | binary/octet-stream | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alon | 2015-05-08 21:05:10 | Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale. |
Previous Message | Michael Paquier | 2015-05-08 06:00:59 | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |