Fwd: Problems with pg_locks explosion

From: Armand du Plessis <adp(at)bank(dot)io>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Fwd: Problems with pg_locks explosion
Date: 2013-04-02 00:31:35
Message-ID: CANf99sWN4LP_xF012miq=MghCYtbm369bup9tVT1+BiTOR9sUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Mark,

I had a look at the iostat output (on a 5s interval) and pasted it below.
The utilization and waits seems low. Included a sample below, #1 taken
during normal operation and then when the locks happen it basically drops
to 0 across the board. My (mis)understanding of the IOPS was that it would
be 1000 IOPS per/volume and when in RAID0 should give me quite a bit higher
throughput than in a single EBS volume setup. (My naive envelop calculation
was #volumes * PIOPS = Effective IOPS :/)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdk 0.00 0.00 141.60 0.00 5084.80 0.00 35.91
0.43 3.06 0.51 7.28
xvdj 0.00 0.00 140.40 0.40 4614.40 24.00 32.94
0.49 3.45 0.52 7.28
xvdi 0.00 0.00 123.00 2.00 4019.20 163.20 33.46
0.33 2.63 0.68 8.48
xvdh 0.00 0.00 139.80 0.80 4787.20 67.20 34.53
0.52 3.73 0.55 7.68
xvdg 0.00 0.00 143.80 0.20 4804.80 16.00 33.48
0.86 6.03 0.72 10.40
xvdf 0.00 0.00 146.40 0.00 4758.40 0.00 32.50
0.55 3.76 0.55 8.00
md127 0.00 0.00 831.20 3.40 27867.20 270.40 33.71
0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 100.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

It only spikes to 100% util when the server restarts. What bugs me though
is Cloud Metrics show 100% Throughput on all the volumes despite the output
above.

I'm looking into vm.dirty_background_ratio, vm.dirty_ratio sysctls. Is
there any guidance or links available that would be useful as a starting
point?

Thanks again for the help, I really appreciate it.

Regards,

Armand

On Tue, Apr 2, 2013 at 2:11 AM, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
> wrote:

> In addition to tuning the various Postgres config knobs you may need to
> look at how your AWS server is set up. If your load is causing an IO stall
> then *symptoms* of this will be lots of locks...
>
> You have quite a lot of memory (60G), so look at tuning the
> vm.dirty_background_ratio, vm.dirty_ratio sysctls to avoid trying to
> *suddenly* write out many gigs of dirty buffers.
>
> Your provisioned volumes are much better than the default AWS ones, but
> are still not hugely fast (i.e 1000 IOPS is about 8 MB/s worth of Postgres
> 8k buffers). So you may need to look at adding more volumes into the array,
> or adding some separate ones and putting pg_xlog directory on 'em.
>
> However before making changes I would recommend using iostat or sar to
> monitor how volumes are handling the load (I usually choose a 1 sec
> granularity and look for 100% util and high - server hundred ms - awaits).
> Also iotop could be enlightening.
>
> Regards
>
> Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ian Lawrence Barwick 2013-04-02 00:40:07 Re: Postgres upgrade, security release, where?
Previous Message Bruce Momjian 2013-04-02 00:27:06 Re: Postgres upgrade, security release, where?