From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date: | 2015-05-09 21:41:02 |
Message-ID: | CAEepm=3ctG4RZZDjUycMx0_TkSUAVmVKJzowGDwzTy_BEFZcjQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, May 10, 2015 at 12:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On May 9, 2015, at 8:00 AM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>>> Thomas Munro wrote:
>>>>> I think the fix is something like "if nextMXact == oldestMultiXactId,
>>>>> then there are no active multixacts, so the offsetStopLimit should be
>>>>> set to nextOffset - (a segment's worth)".
>>>>
>>>> Makes sense.
>>>
>>> Here's a patch that attempts to implement this.
>>
>> Thanks. I think I have managed to reproduce something like the data
>> loss race that we were speculating about.
>>
>> 0. initdb, autovacuum = off, set up explode_mxact_members.c as
>> described elsewhere in the thread.
>> 1. Fill up the members SLRU completely (ie reach state where you can
>> no longer create a new multixact of any size). pg_multixact/members
>> contains 82040 files and the last one is named 14077.
>> 2. Issue CHECKPOINT, but use a debugger to stop inside
>> TruncateMultiXact after it has read
>> MultiXactState->lastCheckpointedOldest and released the lock, but
>> before it calls SlruScanDirectory to delete files...
>> 3. Run VACUUM FREEZE in all databases (including template0). datminmxid moves.
>> 4. Create lots of new multixacts. pg_multixact/members now contains
>> 82041 files and the last one is named 14078 (ie one extra segment,
>> with the highest possible segment number, which couldn't be created
>> before vacuuming because of the one segment gap enforced by
>> DetermineSafeOldestOffset). Segments 0000-0016 have new modified
>> times.
>> 5. ... allow the checkpoint started in step 2 to continue. It
>> deletes segments, keeping only 0000-0016. The segment 14078 which
>> contained active member data has been incorrectly deleted.
>
> OK. So the next question is: if you then apply the other patch, does that prevent step 4 and thereby avoid catastrophe?
Yes, in a quick test, at step 4 I couldn't proceed. I need to prod
this some more on Monday, and also see how it interacts with
autovacuum's view of what work needs to be done.
Here is my attempt at a summary. In master, we have 3 arbitrarily
overlapping processes:
1. VACUUM advances oldest multixact and member tail.
2. CHECKPOINT observes member tail and head (separately) and then
deletes storage.
3. Regular transaction obverses tail, checks boundary and advances head.
Information flows from 1 to 2, from 3 to 2, and from 1 to 3. 2
doesn't have a consistent view of head and tail, and doesn't prevent
them moving while deleting storage, so the effect is that we can
delete the wrong range of storage.
With the patch, we have 3 arbitrarily overlapping processes:
1. VACUUM advances oldest multixact.
2. CHECKPOINT observes oldest multixact, deletes storage and then
advances member tail.
3. Regular transaction observes member tail, checks boundary and
advances member head.
Information flows from 1 to 2 and from 2 to 3. Although 2 works with
a snapshot of the oldest multixact which may move before it deletes
storage, 2 knows that the member tail isn't moving (that is its job),
and that 3 can't move the head past the the tail (or rather the stop
limit which is the tail minus a gap), so the effect of using an out of
date oldest multixact is that we err on the side of being too
conservative with our allocation of member space, which is good.
I suppose you could have a workload that eats member space really fast
and checkpoints too infrequently so that you run out of space before a
checkpoint advances the tail. I think that is why you were suggesting
triggering checkpoints automatically in some cases. But I think that
would be a pretty insane workload (I can't convince my computer to
consume 2^32 member elements in under a couple of hours using the
pathological explode_mxact_members.c workload, and you can't set
checkpoint time above 1 hour).
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Evan Martin | 2015-05-10 16:57:31 | Re: BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table |
Previous Message | Bruce Momjian | 2015-05-09 19:41:08 | Re: BUG #13179: pg_upgrade failure. |