Re: Raid 10 chunksize

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-02 20:27:06
Message-ID: b42b73150904021327s40a9a2ccvaa58193b05197ec6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Apr 2, 2009 at 4:20 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>
> On 4/2/09 10:58 AM, "Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote:
>
>> On Wed, Mar 25, 2009 at 12:16 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>>> On 3/25/09 1:07 AM, "Greg Smith" <gsmith(at)gregsmith(dot)com> wrote:
>>>> On Wed, 25 Mar 2009, Mark Kirkwood wrote:
>>>>> I'm thinking that the raid chunksize may well be the issue.
>>>>
>>>> Why?  I'm not saying you're wrong, I just don't see why that parameter
>>>> jumped out as a likely cause here.
>>>>
>>>
>>> If postgres is random reading or writing at 8k block size, and the raid
>>> array is set with 4k block size, then every 8k random i/o will create TWO
>>> disk seeks since it gets split to two disks.   Effectively, iops will be cut
>>> in half.
>>
>> I disagree.  The 4k raid chunks are likely to be grouped together on
>> disk and read sequentially.  This will only give two seeks in special
>> cases.
>
> By definition, adjacent raid blocks in a stripe are on different disks.
>
>
>> Now, if the PostgreSQL block size is _smaller_ than the raid
>> chunk size,  random writes can get expensive (especially for raid 5)
>> because the raid chunk has to be fully read in and written back out.
>> But this is mainly a theoretical problem I think.
>
> This is false and a RAID-5 myth.  New parity can be constructed from the old
> parity + the change in data.  Only 2 blocks have to be accessed, not the
> whole stripe.
>
> Plus, this was about RAID 10 or 0 where parity does not apply.
>
>>
>> I'm going to go out on a limb and say that for block sizes that are
>> within one or two 'powers of two' of each other, it doesn't matter a
>> whole lot.  SSDs might be different, because of the 'erase' block
>> which might be 128k, but I bet this is dealt with in such a fashion
>> that you wouldn't really notice it when dealing with different block
>> sizes in pg.
>
> Well, raid block size can be significantly larger than postgres or file
> system block size and the performance of random reads / writes won't get
> worse with larger block sizes.  This holds only for RAID 0 (or 10), parity
> is the ONLY thing that makes larger block sizes bad since there is a
> read-modify-write type operation on something the size of one block.
>
> Raid block sizes smaller than the postgres block is always bad and
> multiplies random i/o.
>
> Read a 8k postgres block in a 8MB md raid 0 block, and you read 8k from one
> disk.
> Read a 8k postgres block on a md raid 0 with 4k blocks, and you read 4k from
> two disks.

yep...that's good analysis...thinko on my part.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-04-02 20:34:13 Re: Raid 10 chunksize
Previous Message Scott Carey 2009-04-02 20:20:15 Re: Raid 10 chunksize