Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date: 2015-05-11 15:00:10
Message-ID: CA+TgmoYO1Me9mxpKriVzCCDYBSk3WSVLXCEY1d9YX-mtqDQPmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 11, 2015 at 7:56 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Mon, May 11, 2015 at 2:45 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Sun, May 10, 2015 at 9:41 AM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> On Sun, May 10, 2015 at 12:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> OK. So the next question is: if you then apply the other patch, does that prevent step 4 and thereby avoid catastrophe?
>>>
>>> Yes, in a quick test, at step 4 I couldn't proceed. I need to prod
>>> this some more on Monday, and also see how it interacts with
>>> autovacuum's view of what work needs to be done.
>>
>> The code in master which handles regular autovacuums seems correct
>> with this patch, because it measures member space usage by calling
>> find_multixact_start itself with the oldest multixact ID (it's not
>> dependent on anything that is updated at checkpoint time).
>>
>> The code in the patch at
>> http://www.postgresql.org/message-id/CA+TgmobbaQpE6sNqT30+rz4UMH5aPraq20gko5xd2ZGajz1-Jg@mail.gmail.com
>> would become wrong though, because it would use the (new) variable
>> MultiXactState->oldestOffset (set at checkpoint) to measure the used
>> member space. That means it would repeatedly launch autovacuums, even
>> after clearing away the problem and advancing the oldest multixact ID,
>> until the next checkpoint updates that value. In other words, it
>> can't see its own progress immediately (which is the right approach
>> for blocking new multixact generation, ie defer until
>> checkpoint/truncation, but the wrong approach for triggering
>> autovacuums).
>>
>> I think vacuum (SetMultiXactIdLimit?) needs to update oldestOffset,
>> not checkpoint (DetermineSafeOldestOffset). (The reason for wanting
>> this new value in shared memory is because GetNextMultiXactId needs to
>> be able to check it cheaply for every call, so calling
>> find_multixact_start every time would presumably not fly).
>
> Here's a new version of the patch to do that. As before, it tracks
> the oldest offset in shared memory, but now that is updated in
> SetMultiXactIdLimit, so it is always updated at the same time as
> MultiXactState->oldestMultiXactId (at startup and after full scan
> vacuums).
>
> The value is used in the following places:
>
> 1. GetNewMultiXactId uses it to see if it needs to send
> PMSIGNAL_START_AUTOVAC_LAUNCHER to request autovacuums even if
> autovacuum is set to off. That is the main purpose of this patch.
> (GetNewMultiXactId *doesn't* use it for member wraparound prevention:
> that is based on offsetStopLimit, set by checkpoint code after
> truncation of physical storage.)
>
> 2. SetMultiXactIdLimit itself also uses it to send a
> PMSIGNAL_START_AUTOVAC_LAUNCHER signal to the postmaster (according to
> comments this allows immediately doing some more vacuuming upon
> completion if necessary).
>
> 3. ReadMultiXactCounts, called by regular vacuum and autovacuum,
> rather than doing its own call to find_multixact_start, now also reads
> it from shared memory. (Incidentally the code this replaces suffers
> from the problem fixed elsewhere it can call find_multixact_start for
> a multixact that doesn't exist yet).
>
> Vacuum runs as expected with with autovacuum off.

Great. I've committed this and back-patched it with 9.3, after making
your code look a little more like what I already committed for the
same task, and whacking the comments around.

> Do you think we
> should be using MULTIXACT_MEMBER_DANGER_THRESHOLD as the trigger level
> for forced vacuums instead of MULTIXACT_MEMBER_SAFE_THRESHOLD, or
> something else?

No, I think you have it right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Berend De Schouwer 2015-05-11 15:09:31 Re: BUG #13267: Some timezones in pg_timezone_names are missing in pg_timezone_abbrevs
Previous Message Tom Lane 2015-05-11 14:41:39 Re: BUG #13269: "alter constraint child_parent deferrable initially deferred" sometimes does not make FK deferred