On Fri, Jun 5, 2015 at 11:47 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Here's a new version with some more fixes and improvements:
>>
>> - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset
>> when the oldest offset became known if the now-known value happened to
>> be zero. Fixed.
>>
>> - SetOffsetVacuumLimit now logs useful information at the DEBUG1
>> level, so that you can see that it's doing what it's supposed to.
>>
>> - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the
>> offsetStopLimit even if it can't truncate anything. This seems
>> useless, but it's not, because it may be that the last checkpoint
>> advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real
>> value, and now we can actually set offsetStopLimit properly.
>>
>> - TruncateMultiXact no longer calls find_multixact_start when there
>> are no remaining multixacts. This is actually a completely separate
>> bug that goes all the way back to 9.3.0 and can potentially cause
>> TruncateMultiXact to remove every file in pg_multixact/offsets.
>> Restarting the cluster becomes impossible because TrimMultiXact barfs.
>>
>> - TruncateMultiXact now logs a message if the oldest multixact does
>> not precede the earliest one on disk and is not equal to the next
>> multixact and yet does not exist. The value of the log message is
>> that it discovered the bug mentioned in the previous line, so I think
>> it's earning its keep.
>>
>> With this version, I'm able to see that when you start up a
>> 9.3.latest+this patch with a cluster that has a bogus value of 1 in
>> relminmxid, datminmxid, and the control file, autovacuum vacuums
>> everything in sight, all the values get set back to the right thing,
>> and the next checkpoint enables the member-wraparound guards. This
>> works with both autovacuum=on and autovacuum=off; the emergency
>> mechanism kicks in as intended. We'll want to warn people with big
>> databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may
>> want to pre-vacuum those tables before upgrading to avoid a vacuum
>> storm. But generally I'm pretty happy with this: forcing those values
>> to get fixed so that we can guard against member-space wraparound
>> seems like the right thing to do.
>>
>> So, to summarize, this patch does the following:
>>
>> - Fixes the failure-to-start problems introduced in 9.4.2 in
>> complicated pg_upgrade scenarios.
>> - Prevents the new calls to find_multixact_start we added in 9.4.2
>> from happening during recovery, where they can only create failure
>> scenarios. The call in TruncateMultiXact that has been there all
>> along is not eliminated, but now handles failure more gracefully.
>> - Fixes possible incorrect removal of every single
>> pg_multixact/offsets file when no multixacts exist; one file should be
>> kept.
>> - Forces aggressive autovacuuming when the control file's
>> oldestMultiXid doesn't point to a valid MultiXact and enables member
>> wraparound at the next checkpoint following the correction of that
>> problem.
>
> With this patch, when I run the script
> "checkpoint-segment-boundary.sh" from
> http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=UDg@mail.gmail.com
> I see the following during shutdown checkpoint:
>
> LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound
>
> That message comes from SimpleLruTruncate.
Suggested patch attached.
--
Thomas Munro
http://www.enterprisedb.com