Re: Impact of multixact "members" limit exceeded

From: Daniel Lagerman <spjheruur(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Impact of multixact "members" limit exceeded
Date: 2018-06-14 22:57:24
Message-ID: CAAw8gvibW6PMKAuv9utBtgzZDo-sZY=1Ex1s3zDKtWYVgg2=Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> Hello Daniel
>
> The advice from Lawrence downthread is very much on point. If you can
> upgrade to 9.5, do it. He mentions savepoints, but one more case is
> plpgsql blocks with EXCEPTION clauses.
>

Hello Álvaro,

Thanks I'll make sure of both even if I do not believe this is this case
either.

Okay. There's at least one related bug fix in 9.4.4 (b6a3444fa635) but
> also 8507a5b37bd9 and ec1408155d35 are related, so you'd be better
> served by an upgrade anyway.
>

Not sure how to lookup these commits but for sure there are ALOT of bug
fixes and I'm pushing that we will upgrade as 9.5 and 9.6 has optimizations
an bug fixes that makes my life much easier. It looks to me that we might
have hit the BUG where members runs ahead of offset which I think has been
fixed in later versions.

>
> Set vacuum_multixact_freeze_table_age to 0 (causing all vacuums to scan
> tables fully) and vacuum_multixact_freeze_min_age to some value like one
> million (so that they remove most of the oldest multixacts, leaving
> just the frontmost one million). Then vacuum the tables with the
> highest multixact ages. Make sure the datminmxid advances in all
> databases (incl. postgres template0 template1); once it has advanced,
> the oldest member files are removed. Don't waste time processing tables
> with datminmxid higher than the minimum, as that won't free up any
> member space.
>

Thanks, we opted to go with 1 million min and 100 million table age, the
tables in question are all way over that anyway. We completed vacuum on all
but two which are larger tables and I'm running Vacuum there right now
after I also bumped the maintenance work mem a bit. As this is 9.4 I can't
see the progress except that I noted that initialy it uses less CPU and
once it seems to finnaly start working it uses more CPU 10%->50% but that
is fine. So my hope is that the Vacuum completes over night.

Corruption should not occur because 9.4.3 already contains the member
> wraparound protections (commit 3ecab37d97ed).
>

Glad to hear that and I have not seen any problems to be honest except for
the pg_srv log filling up.

Uhh .. pg_repack has been known to cause catalog corruption. I don't
> know if those bugs have been fixed. At 2ndQuadrant we recommend *not*
> running pg_repack.
>
>
I know version before 1.2 had some corruption issues, I have never seen it
myself around the 20 times I have been using it. But you should never say
never right? I did the same operation on a duplicate server and there is no
issues there so I believe we were just way behind on Vacuum.

work_mem does not affect vacuum. maintenance_work_mem is the one to
> change; autovacuum_work_mem can be used for autovacuum.
>

Yep, that was a typo on my end I was referring to maintenance work mem of
course.

"Zero pauses" (vacuum_cost_delay=0) is best, because it's obviously much
> faster.
>

Yea, I will look into that if needed during the weekend if needed combined
with more maintenance work mem if Vacuum has not completed. I can't really
do Zero Pauses during normal operations as I need to write to the DB.

>
> So what did you do, cancel it?
>

Yes, I actually used flexible freeze with a timeout so It terminated just
before the Maintenance window ended. However now we run straight VACUUM on
these tables due to this problem. I might even opt to not restart the
server this weekend, don't really need a weekly restart to keep it going
since it seems to have a good pace now.

Many thanks for your insight and guidance!

Best regards

Daniel

On Thu, Jun 14, 2018 at 10:37 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Hello Daniel
>
> The advice from Lawrence downthread is very much on point. If you can
> upgrade to 9.5, do it. He mentions savepoints, but one more case is
> plpgsql blocks with EXCEPTION clauses.
>
> On 2018-Jun-13, Daniel Lagerman wrote:
>
> > Hello Álvaro,
> >
> > I'm running at 9.4.3, I know its an older version but upgrading it
> outside
> > the scope right now for various reasons.
>
> Okay. There's at least one related bug fix in 9.4.4 (b6a3444fa635) but
> also 8507a5b37bd9 and ec1408155d35 are related, so you'd be better
> served by an upgrade anyway.
>
> > Based on the settings I posted what would be your recommended settings
> > right now to resolve this situation?
>
> Set vacuum_multixact_freeze_table_age to 0 (causing all vacuums to scan
> tables fully) and vacuum_multixact_freeze_min_age to some value like one
> million (so that they remove most of the oldest multixacts, leaving
> just the frontmost one million). Then vacuum the tables with the
> highest multixact ages. Make sure the datminmxid advances in all
> databases (incl. postgres template0 template1); once it has advanced,
> the oldest member files are removed. Don't waste time processing tables
> with datminmxid higher than the minimum, as that won't free up any
> member space.
>
> > I very much appreciate the recommendation but I would also like to now
> the
> > impact of this warning, data right now does not seem to be corrupted, get
> > overwritten, for that matter not insert new data or cause issues when
> read.
>
> Corruption should not occur because 9.4.3 already contains the member
> wraparound protections (commit 3ecab37d97ed).
>
> > It just seems that is has stopped creating "members" and I can't read
> into
> > what problem that would cause, table was around 1.6 TB, it was cleaned up
> > and the re-packed using pg-repack to go down to around 400 GB in size,
> > however a freeze vacuum or autovacuum did not complete.
>
> Uhh .. pg_repack has been known to cause catalog corruption. I don't
> know if those bugs have been fixed. At 2ndQuadrant we recommend *not*
> running pg_repack.
>
> > Then when it comes to vacuum, do you think its worth increasing work_mem
> to
> > say 200 GB,
>
> work_mem does not affect vacuum. maintenance_work_mem is the one to
> change; autovacuum_work_mem can be used for autovacuum.
>
> > for the maintenance period only, and let autovacuum go to town
> > on the tables with the highest age and lowest minmixid? Or should one run
> > for example flexible freeze instead with zero pauses as nothing is
> > happening on the weekends, which means no inbound transactions.
>
> "Zero pauses" (vacuum_cost_delay=0) is best, because it's obviously much
> faster.
>
> > I have tried running autovacuum for 40 hours before with 5 GB work_mem,
> on
> > the weekend, across 6 tables at the same time and it never completed on
> the
> > bigger tables which are around 2 TB.
>
> So what did you do, cancel it?
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2018-06-14 23:15:42 Re: Impact of multixact "members" limit exceeded
Previous Message Melvin Davidson 2018-06-14 22:36:25 Re: PostgreSQL Volume Question