Re: Problems with pg_locks explosion

From: Armand du Plessis <adp(at)bank(dot)io>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with pg_locks explosion
Date: 2013-04-02 08:16:57
Message-ID: CANf99sXs9OZXqdaoyaAk47QD=LcVk9cDFvR=+_3ezCwy76Hp0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Touch wood but I think I found the problem thanks to these pointers. I
checked the vm.zone_reclaim_mode and mine was set to 0. However just before
the locking starts I can see many of my CPUs flashing red and jump to high
percentage sys usage. When I look at top it's the migration kernel tasks
that seem to trigger it.

So it seems it was a bit trigger happy with task migrations, setting
the kernel.sched_migration_cost
to 5000000 (5ms) seemed to have resolved my woes. I'm yet to see locks
climb and it's been running stable for a bit. This post was invaluable in
explaining the cause ->
http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com

# Postgres Kernel Tweaks
kernel.sched_migration_cost = 5000000
# kernel.sched_autogroup_enabled = 0

The second recommended setting 'sched_autogroup_enabled' is not available
on the kernel I'm running but it doesn't seem to be a problem.

Again, thanks again for the help. It was seriously appreciated. Long night
was long.

If things change and the problem pops up again I'll update you guys.

Cheers,

Armand

On Tue, Apr 2, 2013 at 8:43 AM, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
> wrote:

> Also it is worth checking what your sysctl vm.zone_reclaim_mode is set to
> - if 1 then override to 0. As Jeff mentioned, this gotcha for larger cpu
> number machines has been discussed at length on this list - but still traps
> us now and again!
>
> Cheers
>
> Mark
>
>
> On 02/04/13 19:33, Armand du Plessis wrote:
>
>> I had my reservations about my almost 0% IO usage on the raid0 array as
>> well. I'm looking at the numbers in atop and it doesn't seem to reflect
>> the aggregate of the volumes as one would expect. I'm just happy I am
>> seeing numbers on the volumes, they're not too bad.
>>
>> One thing I was wondering, as a last possible IO resort. Provisioned EBS
>> volumes requires that you maintain a wait queue of 1 for every 200
>> provisioned IOPS to get reliable IO. My wait queue hovers between 0-1
>> and with the 1000 IOPS it should be 5. Even thought about artificially
>> pushing more IO to the volumes but I think Jeff's right, there's some
>> internal kernel voodoo at play here. I have a feeling it'll be under
>> control with pg_pool (if I can just get the friggen setup there right)
>> and then I'll have more time to dig into it deeper.
>>
>> Apologies to the kittens for the interrupting your leave :)
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Page 2013-04-02 08:34:39 Re: Postgres upgrade, security release, where?
Previous Message Mark Kirkwood 2013-04-02 06:43:34 Re: Problems with pg_locks explosion