Re: Raid 10 chunksize

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-03-25 01:48:36
Message-ID: C5EEDB84.3B34%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 3/24/09 6:09 PM, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz> wrote:

> I'm trying to pin down some performance issues with a machine where I
> work, we are seeing (read only) query response times blow out by an
> order of magnitude or more at busy times. Initially we blamed
> autovacuum, but after a tweak of the cost_delay it is *not* the
> problem. Then I looked at checkpoints... and altho there was some
> correlation with them and the query response - I'm thinking that the
> raid chunksize may well be the issue.
>
> Fortunately there is an identical DR box, so I could do a little
> testing. Details follow:
>
> Sun 4140 2x quad-core opteron 2356 16G RAM, 6x 15K 140G SAS
> Debian Lenny
> Pg 8.3.6
>
> The disk is laid out using software (md) raid:
>
> 4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
> 2 drives raid 1 with database transaction logs (ext3 ordered, noatime)
>

>
> Top looks like:
>
> Cpu(s): 2.5%us, 1.9%sy, 0.0%ni, 71.9%id, 23.4%wa, 0.2%hi, 0.2%si,
> 0.0%st
> Mem: 16474084k total, 15750384k used, 723700k free, 1654320k buffers
> Swap: 2104440k total, 944k used, 2103496k free, 13552720k cached
>
> It looks to me like we are maxing out the raid 10 array, and I suspect
> the chunksize (4K) is the culprit. However as this is a pest to change
> (!) I'd like some opinions on whether I'm jumping to conclusions. I'd
> also appreciate comments about what chunksize to use (I've tended to use
> 256K in the past, but what are folks preferring these days?)
>
> regards
>
> Mark
>
>

md tends to work great at 1MB chunk sizes with RAID 1 or 10 for whatever
reason. Unlike a hardware raid card, smaller chunks aren't going to help
random i/o as it won't read the whole 1MB or bother caching much. Make sure
any partitions built on top of md are 1MB aligned if you go that route.
Random I/O on files smaller than 1MB would be affected -- but that's not a
problem on a 16GB RAM server running a database that won't fit in RAM.

Your xlogs are occasionally close to max usage too -- which is suspicious at
10MB/sec. There is no reason for them to be on ext3 since they are a
transaction log that syncs writes so file system journaling doesn't mean
anything. Ext2 there will lower the sync times and reduced i/o utilization.

I also tend to use xfs if sequential access is important at all (obviously
not so in pg_bench). ext3 is slightly safer in a power failure with unsyncd
data, but Postgres has that covered with its own journal anyway so those
differences are irrelevant.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Rees 2009-03-25 02:04:42 Re: Raid 10 chunksize
Previous Message Mark Kirkwood 2009-03-25 01:09:07 Raid 10 chunksize