Re: Disk latency goes up during certaing pediods

From: German Becker <german(dot)becker(at)gmail(dot)com>
To: bricklen <bricklen(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Disk latency goes up during certaing pediods
Date: 2013-07-31 16:25:41
Message-ID: CALyjCLuDgczTxoqiXKSThqMwwAoPM_4rPE4+p8zr_g=k11crtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

To all whom might be interested, I have an update on this.
I run some tests on the old production DB which was Posgres 8.3 (and only
one disk for everything), using pgreplay, running the same queries as the
9.1 server.
Here is the output of iostat for the 8.3 server:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 230.00 22.00 106.00 308.00 2692.00 23.44
0.33 2.58 2.34 30.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 166.00 9.50 65.50 160.00 1708.00 24.91
0.29 3.07 2.47 18.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 236.50 7.50 118.50 120.00 2984.00 24.63
0.39 3.61 1.55 19.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 310.50 7.50 168.50 112.00 3832.00 22.41
0.44 2.50 0.94 16.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 321.50 22.50 184.00 320.00 4048.00 21.15
0.88 4.24 1.74 36.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 266.00 4.50 155.00 64.00 3356.00 21.44
0.29 1.72 0.88 14.00

Here is the output for 9.1:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 85.00 0.00 352.00 8.28
0.29 3.46 0.00 3.46 3.46 29.40

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.50 0.00 97.00 0.00 450.00 9.28
0.39 4.04 0.00 4.04 3.79 36.80

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 87.00 0.00 376.00 8.64
0.29 3.29 0.00 3.29 3.29 28.60

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.50 0.00 92.00 0.00 386.00 8.39
0.32 3.43 0.00 3.43 3.28 30.20

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 89.50 0.00 388.00 8.67
0.33 3.66 0.00 3.66 3.66 32.80

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 104.50 0.00 432.00 8.27
0.38 3.62 0.00 3.62 3.62 37.80

(the columns are not the same, probably because of different Ubuntu
versions)
What is notable is the following:

8.3 shows much less disk utilization even though the same disk is used for,
for example the plain log
I think the main difference is average request size and w/s. In 9.1 the
request sizes seem to be roughly half of those of 8.3, so the w/s are
rougly double.

I think this might be because of the differnt setting in wal_level; in 8.3
we were using archive, (hot_standby was not available yet) and in 9.1 we
are using hot_standby. Here is what the documentations says about this:

"It is thought that there is little measurable difference in performance
between using hot_standby and archive levels, so feedback is welcome if any
production impacts are noticeable."

Although, of course, this might be just the difference betwen Postgres
releases.

Any thoughts or feedback is appreciated.
Cheers,

Germán Becker

On Tue, Jul 30, 2013 at 1:02 PM, bricklen <bricklen(at)gmail(dot)com> wrote:

> On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german(dot)becker(at)gmail(dot)com>wrote:
>
>> 256 was set some time when we were testing a differnt issue. I read that
>> the only drawback is the amunt of time required for recovery, which was
>> tested and it was like 10 seconds for the 256 segments, and higher values
>> mean less disk usage.
>> Anyway all these parameters should affect the throughput to the data
>> disks, not the WAL, Am I right?
>>
>>
> checkpoint_completion_target is to help with "checkpoint smoothing", to
> reduce the spike in disk I/O when shared_buffers are written out. Depesz
> has a good article about that:
> http://www.depesz.com/2010/11/03/checkpoint_completion_target/
>
> Do your graphs show any correlation between number of WAL segments getting
> recycled, and disk I/O spikes? Are you logging checkpoints? If so, you
> could use the checkpoint times to compare against your I/O graphs. I am by
> no means an expert here, I'm just throwing out ideas (which might already
> have been suggested).
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message bricklen 2013-07-31 16:34:22 Re: Disk latency goes up during certaing pediods
Previous Message Jürgen Fuchsberger 2013-07-31 13:16:43 Problem: pg_ctl does not not stop postgres properly even with immediate option