Re: Raid 10 chunksize

From: Scott Carey <scott(at)richrelevance(dot)com>
To: "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-04 05:24:52
Message-ID: C5FC3D34.4343%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 4/3/09 6:05 PM, "david(at)lang(dot)hm" <david(at)lang(dot)hm> wrote:

> On Fri, 3 Apr 2009, Greg Smith wrote:
>
>> Hannes sent this off-list, presumably via newsgroup, and it's certainly worth
>> sharing. I've always been scared off of using XFS because of the problems
>> outlined at http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc , with more
>> testing showing similar issues at http://pages.cs.wisc.edu/~vshree/xfs.pdf
>> too
>>
>> (I'm finding that old message with Ted saying "Making sure you don't lose
>> data is Job #1" hilarious right now, consider the recent ext4 data loss
>> debacle)
>
> also note that the message from Ted was back in 2004, there has been a
> _lot_ of work done on XFS in the last 4 years.
>
> as for the second link, that focuses on what happens to the filesystem if
> the disk under it starts returning errors or garbage. with the _possible_
> exception of ZFS, every filesystem around will do strange things under
> those conditions. and in my option, the way to deal with this sort of
> thing isn't to move to ZFS to detect the problem, it's to setup redundancy
> in your storage so that you can not only detect the problem, but correct
> it as well (it's a good thing to know that your database file is corrupt,
> but that's not nearly as useful as having some way to recover the data
> that was there)

Not trying to spread too much kool-aid around, but ZFS does that.

If a mirror set (which might be 2, 3 or more copies in the mirror) detects a
checksum error, it reads the other copies and attempts to correct the bad
block.
PLUS, the performance under normal conditions for reads scales with the
mirrors. 12 disks in raid 10 do writes as fast as 6 disk raid 0, but reads
as fast as 12 disk raid 0 since it does not have to read both mirror sets to
detect an error, only to recover. You can even just write zeros to random
spots in a mirror and it will throw errors and use the other copies.

This really isn't a ZFS promotion, rather its a promotion of the power of
checksums at the file system and raid level. A hardware raid card could
just as well sacrifice some space to place checksums on its blocks and get
much the same result.

>
> David Lang
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message henk de wit 2009-04-04 10:00:52 Re: Using IOZone to simulate DB access patterns
Previous Message Greg Smith 2009-04-04 02:35:58 Re: Question on pgbench output