From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date: | 2015-05-09 12:43:49 |
Message-ID: | 8486B09E-773B-4838-A7E8-8E48433245E1@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On May 9, 2015, at 8:00 AM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>> Thomas Munro wrote:
>>>> I think the fix is something like "if nextMXact == oldestMultiXactId,
>>>> then there are no active multixacts, so the offsetStopLimit should be
>>>> set to nextOffset - (a segment's worth)".
>>>
>>> Makes sense.
>>
>> Here's a patch that attempts to implement this.
>
> Thanks. I think I have managed to reproduce something like the data
> loss race that we were speculating about.
>
> 0. initdb, autovacuum = off, set up explode_mxact_members.c as
> described elsewhere in the thread.
> 1. Fill up the members SLRU completely (ie reach state where you can
> no longer create a new multixact of any size). pg_multixact/members
> contains 82040 files and the last one is named 14077.
> 2. Issue CHECKPOINT, but use a debugger to stop inside
> TruncateMultiXact after it has read
> MultiXactState->lastCheckpointedOldest and released the lock, but
> before it calls SlruScanDirectory to delete files...
> 3. Run VACUUM FREEZE in all databases (including template0). datminmxid moves.
> 4. Create lots of new multixacts. pg_multixact/members now contains
> 82041 files and the last one is named 14078 (ie one extra segment,
> with the highest possible segment number, which couldn't be created
> before vacuuming because of the one segment gap enforced by
> DetermineSafeOldestOffset). Segments 0000-0016 have new modified
> times.
> 5. ... allow the checkpoint started in step 2 to continue. It
> deletes segments, keeping only 0000-0016. The segment 14078 which
> contained active member data has been incorrectly deleted.
OK. So the next question is: if you then apply the other patch, does that prevent step 4 and thereby avoid catastrophe?
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2015-05-09 18:13:09 | Re: psqlodbc: HEAD fails to build with recent clang |
Previous Message | Thomas Munro | 2015-05-09 12:00:49 | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |