Quick Links

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date:	2015-06-04 13:42:22
Message-ID:	CA+TgmobJ4gMEdrQig7NufJWoX1TBTiP0aD0D7ZL9j6TMy+JqJw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> I like that change a lot. It's much easier to seek forgiveness for wasting <=
> 28 GiB of disk than for deleting visibility information wrongly.

I'm glad you like it. I concur.

>> 2. If setting the offset stop limit (the point where we refuse to
>> create new multixact space), we don't arm the stop point. This means
>> that if you're in this situation, you run without member wraparound
>> protection until it's corrected. A message gets logged once per
>> checkpoint telling you that you have this problem, and another message
>> gets logged when things get straightened out and the guards are
>> enabled.
>>
>> 3. If setting the vacuum force point, we assume that it's appropriate
>> to immediately force vacuum.
>
> Those seem reasonable, too.

Cool.

>> I've only tested this very lightly - this is just to see what you and
>> Noah and others think of the approach. As compared with the previous
>> approach, it has the advantage of making minimal assumptions about the
>> sanity of what's on disk. It has the disadvantage that, for some
>> people, the member-wraparound guard won't be enabled at startup -- but
>> note that those people can't start 9.3.7/9.4.2 *at all*, so currently
>> they are either running without member wraparound protection anyway
>> (if they haven't upgraded to those releases) or they're down entirely.
>
> That disadvantage is negligible, considering.

All right.

>> Another disadvantage is that we'll be triggering what may be quite a
>> bit of autovacuum activity for some people, which could be painful.
>> On the plus side, they'll hopefully end up with sane relminmxid and
>> datminmxid guards afterwards.
>
> That sounds good so long as each table requires just one successful emergency
> autovacuum. I'm not seeing code to ensure that the launched autovacuum will
> indeed perform a full-table scan and update relminmxid; is it there?

No. Oops.

> For sites that can't tolerate an autovacuum storm, what alternative can we
> provide? Is "SET vacuum_multixact_freeze_table_age = 0; VACUUM <table>" of
> every table, done before applying the minor update, sufficient?

I don't know. In practical terms, they probably need to ensure that
if pg_multixact/offsets/0000 does not exist, no relations have
relminmxid = 1 and no remaining databases have datminmxid = 1.
Exactly what it will take to get there is possibly dependent on which
minor release you are running; on current minor releases, I am hopeful
that what you propose is sufficient.

>> static void
>> -DetermineSafeOldestOffset(MultiXactId oldestMXact)
>> +DetermineSafeOldestOffset(MultiXactOffset oldestMXact)
>
> Leftover change from an earlier iteration? The values passed continue to be
> MultiXactId values.

Oopsie.

>> /* move back to start of the corresponding segment */
>> - oldestOffset -= oldestOffset %
>> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
>> + offsetStopLimit = oldestOffset - (oldestOffset %
>> + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
>> + /* always leave one segment before the wraparound point */
>> + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
>> +
>> + /* if nothing has changed, we're done */
>> + if (prevOffsetStopLimitKnown && offsetStopLimit == prevOffsetStopLimit)
>> + return;
>>
>> LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>> - /* always leave one segment before the wraparound point */
>> - MultiXactState->offsetStopLimit = oldestOffset -
>> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
>> + MultiXactState->offsetStopLimit = oldestOffset;
>
> That last line needs s/oldestOffset/offsetStopLimit/, I presume.

Another oops.

Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 at 2015-06-04 06:42:26 from Noah Misch

Responses

Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 at 2015-06-04 16:57:42 from Robert Haas

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2015-06-04 13:50:55	Re: pg_relation_size performance issue
Previous Message	Marc Mamin	2015-06-04 13:01:23	Re: Row visibility issue with consecutive triggers, one being DEFERRED

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2015-06-04 13:43:59	Re: Further issues with jsonb semantics, documentation
Previous Message	Fujii Masao	2015-06-04 13:40:36	Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)