From: | Armand du Plessis <adp(at)bank(dot)io> |
---|---|
To: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Problems with pg_locks explosion |
Date: | 2013-04-02 09:55:14 |
Message-ID: | CANf99sX0jMaVzRo16QCtdymzzm7=kEROOkjooH1po_syFmrjPA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Jumped the gun a bit. the problem still exists like before. But it's
definitely on the right track, below is the output from top in the seconds
before the cluster locks up. For some reason still insisting on moving
tasks around despite bumping the sched_migration_cost cost up to 100ms.
77 root RT 0 0 0 0 S 32.3 0.0 13:55.20 [migration/24]
26512 postgres 20 0 8601m 7388 4992 R 32.3 0.0 0:02.17 postgres:
other_user xxxx(52944) INSERT
38 root RT 0 0 0 0 S 31.3 0.0 17:26.15 [migration/11]
65 root RT 0 0
0 0 S 30.0 0.0 13:18.66 [migration/20]
62 root RT 0 0 0 0 S 29.7 0.0 12:58.81 [migration/19]
47 root RT 0 0 0 0 S 29.0 0.0 18:16.43 [migration/14]
29 root RT 0 0 0 0 S 28.7 0.0 25:21.47 [migration/8]
71 root RT 0 0 0 0 S 28.4 0.0 13:20.31 [migration/22]
95 root RT 0 0 0 0 S 23.8 0.0 13:37.31 [migration/30]
26518 postgres 20 0 8601m 9684 5228 S 21.2 0.0 0:01.89 postgres:
other_user xxxxx(52954) INSERT
6 root RT 0 0 0 0 S 20.5 0.0 39:17.72 [migration/0]
41 root RT 0 0 0 0 S 19.6 0.0 18:21.36 [migration/12]
68 root RT 0 0 0 0 S 19.6 0.0 13:04.62 [migration/21]
74 root RT 0 0 0 0 S 18.9 0.0 13:39.41 [migration/23]
305 root 20 0 0 0 0 S 18.3 0.0 11:34.52 [kworker/27:1]
44 root RT 0 0 0 0 S 17.0 0.0 18:30.71 [migration/13]
89 root RT 0 0 0 0 S 16.0 0.0 12:13.42 [migration/28]
7 root RT 0 0 0 0 S 15.3 0.0 21:58.56 [migration/1]
35 root RT 0 0 0 0 S 15.3 0.0 20:02.05 [migration/10]
53 root RT 0 0 0 0 S 14.0 0.0 12:51.46 [migration/16]
11254 root 0 -20 21848 7532 2788 S 11.7 0.0 22:35.66 atop 1
14 root RT 0 0 0 0 S 10.8 0.0 19:36.56 [migration/3]
26463 postgres 20 0 8601m 7492 5100 R 10.8 0.0 0:00.33 postgres:
other_user xxxxx(32835) INSERT
32 root RT 0 0 0 0 S 10.1 0.0 20:46.18 [migration/9]
16793 root 20 0 0 0 0 S 6.5 0.0 1:12.72 [kworker/25:0]
20 root RT 0 0 0 0 S 5.5 0.0 18:51.81 [migration/5]
48 root 20 0 0 0 0 S 5.5 0.0 3:52.93 [kworker/14:0]
On Tue, Apr 2, 2013 at 10:16 AM, Armand du Plessis <adp(at)bank(dot)io> wrote:
> Touch wood but I think I found the problem thanks to these pointers. I
> checked the vm.zone_reclaim_mode and mine was set to 0. However just
> before the locking starts I can see many of my CPUs flashing red and jump
> to high percentage sys usage. When I look at top it's the migration kernel
> tasks that seem to trigger it.
>
> So it seems it was a bit trigger happy with task migrations, setting the kernel.sched_migration_cost
> to 5000000 (5ms) seemed to have resolved my woes. I'm yet to see locks
> climb and it's been running stable for a bit. This post was invaluable in
> explaining the cause ->
> http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
>
> # Postgres Kernel Tweaks
> kernel.sched_migration_cost = 5000000
> # kernel.sched_autogroup_enabled = 0
>
> The second recommended setting 'sched_autogroup_enabled' is not available
> on the kernel I'm running but it doesn't seem to be a problem.
>
> Again, thanks again for the help. It was seriously appreciated. Long night
> was long.
>
> If things change and the problem pops up again I'll update you guys.
>
> Cheers,
>
> Armand
>
>
> On Tue, Apr 2, 2013 at 8:43 AM, Mark Kirkwood <
> mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
>
>> Also it is worth checking what your sysctl vm.zone_reclaim_mode is set to
>> - if 1 then override to 0. As Jeff mentioned, this gotcha for larger cpu
>> number machines has been discussed at length on this list - but still traps
>> us now and again!
>>
>> Cheers
>>
>> Mark
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Igor Neyman | 2013-04-02 14:55:55 | Re: Join between 2 tables always executes a sequential scan on the larger table |
Previous Message | Dieter Rehbein | 2013-04-02 08:52:01 | Join between 2 tables always executes a sequential scan on the larger table |