From: | Scott Carey <scott(at)richrelevance(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Greg Smith <gsmith(at)gregsmith(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Raid 10 chunksize |
Date: | 2009-04-02 20:20:15 |
Message-ID: | C5FA6C0F.41E8%scott@richrelevance.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 4/2/09 10:58 AM, "Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Mar 25, 2009 at 12:16 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>> On 3/25/09 1:07 AM, "Greg Smith" <gsmith(at)gregsmith(dot)com> wrote:
>>> On Wed, 25 Mar 2009, Mark Kirkwood wrote:
>>>> I'm thinking that the raid chunksize may well be the issue.
>>>
>>> Why? I'm not saying you're wrong, I just don't see why that parameter
>>> jumped out as a likely cause here.
>>>
>>
>> If postgres is random reading or writing at 8k block size, and the raid
>> array is set with 4k block size, then every 8k random i/o will create TWO
>> disk seeks since it gets split to two disks. Effectively, iops will be cut
>> in half.
>
> I disagree. The 4k raid chunks are likely to be grouped together on
> disk and read sequentially. This will only give two seeks in special
> cases.
By definition, adjacent raid blocks in a stripe are on different disks.
> Now, if the PostgreSQL block size is _smaller_ than the raid
> chunk size, random writes can get expensive (especially for raid 5)
> because the raid chunk has to be fully read in and written back out.
> But this is mainly a theoretical problem I think.
This is false and a RAID-5 myth. New parity can be constructed from the old
parity + the change in data. Only 2 blocks have to be accessed, not the
whole stripe.
Plus, this was about RAID 10 or 0 where parity does not apply.
>
> I'm going to go out on a limb and say that for block sizes that are
> within one or two 'powers of two' of each other, it doesn't matter a
> whole lot. SSDs might be different, because of the 'erase' block
> which might be 128k, but I bet this is dealt with in such a fashion
> that you wouldn't really notice it when dealing with different block
> sizes in pg.
Well, raid block size can be significantly larger than postgres or file
system block size and the performance of random reads / writes won't get
worse with larger block sizes. This holds only for RAID 0 (or 10), parity
is the ONLY thing that makes larger block sizes bad since there is a
read-modify-write type operation on something the size of one block.
Raid block sizes smaller than the postgres block is always bad and
multiplies random i/o.
Read a 8k postgres block in a 8MB md raid 0 block, and you read 8k from one
disk.
Read a 8k postgres block on a md raid 0 with 4k blocks, and you read 4k from
two disks.
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2009-04-02 20:27:06 | Re: Raid 10 chunksize |
Previous Message | henk de wit | 2009-04-02 20:18:20 | Re: How to get parallel restore in PG 8.4 to work? |