Re: Raid 10 chunksize

From: david(at)lang(dot)hm
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Raid 10 chunksize
Date: 2009-04-04 01:05:20
Message-ID: alpine.DEB.1.10.0904031800220.28893@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 3 Apr 2009, Greg Smith wrote:

> Hannes sent this off-list, presumably via newsgroup, and it's certainly worth
> sharing. I've always been scared off of using XFS because of the problems
> outlined at http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc , with more
> testing showing similar issues at http://pages.cs.wisc.edu/~vshree/xfs.pdf
> too
>
> (I'm finding that old message with Ted saying "Making sure you don't lose
> data is Job #1" hilarious right now, consider the recent ext4 data loss
> debacle)

also note that the message from Ted was back in 2004, there has been a
_lot_ of work done on XFS in the last 4 years.

as for the second link, that focuses on what happens to the filesystem if
the disk under it starts returning errors or garbage. with the _possible_
exception of ZFS, every filesystem around will do strange things under
those conditions. and in my option, the way to deal with this sort of
thing isn't to move to ZFS to detect the problem, it's to setup redundancy
in your storage so that you can not only detect the problem, but correct
it as well (it's a good thing to know that your database file is corrupt,
but that's not nearly as useful as having some way to recover the data
that was there)

David Lang

> ---------- Forwarded message ----------
> Date: Fri, 3 Apr 2009 10:19:38 +0200
> From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
> Newsgroups: pgsql.performance
> Subject: Re: [PERFORM] Raid 10 chunksize
>
> Ron Mayer wrote:
>> Greg Smith wrote:
>>> On Wed, 1 Apr 2009, Scott Carey wrote:
>>>
>>>> Write caching on SATA is totally fine. There were some old ATA drives
>>>> that when paried with some file systems or OS's would not be safe. There
>>>> are
>>>> some combinations that have unsafe write barriers. But there is a
>>>> standard
>>>> well supported ATA command to sync and only return after the data is on
>>>> disk. If you are running an OS that is anything recent at all, and any
>>>> disks that are not really old, you're fine.
>>> While I would like to believe this, I don't trust any claims in this
>>> area that don't have matching tests that demonstrate things working as
>>> expected. And I've never seen this work.
>>>
>>> My laptop has a 7200 RPM drive, which means that if fsync is being
>>> passed through to the disk correctly I can only fsync <120
>>> times/second. Here's what I get when I run sysbench on it, starting
>>> with the default ext3 configuration:
>>
>> I believe it's ext3 who's cheating in this scenario.
>
> I assume so too. Here the same test using XFS, first with barriers (XFS
> default) and then without:
>
> Linux 2.6.28-gentoo-r2 #1 SMP Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
> GenuineIntel GNU/Linux
>
> /dev/sdb /data2 xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
>
> # sysbench --test=fileio --file-fsync-freq=1 --file-num=1
> --file-total-size=16384 --file-test-mode=rndwr run
> sysbench 0.4.10: multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed: 0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b Written 156.25Mb Total transferred 156.25Mb (463.9Kb/sec)
> 28.99 Requests/sec executed
>
> Test execution summary:
> total time: 344.9013s
> total number of events: 10000
> total time taken by event execution: 0.1453
> per-request statistics:
> min: 0.01ms
> avg: 0.01ms
> max: 0.07ms
> approx. 95 percentile: 0.01ms
>
> Threads fairness:
> events (avg/stddev): 10000.0000/0.00
> execution time (avg/stddev): 0.1453/0.00
>
>
> And now without barriers:
>
> /dev/sdb /data2 xfs
> rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
>
> # sysbench --test=fileio --file-fsync-freq=1 --file-num=1
> --file-total-size=16384 --file-test-mode=rndwr run
> sysbench 0.4.10: multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed: 0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b Written 156.25Mb Total transferred 156.25Mb (62.872Mb/sec)
> 4023.81 Requests/sec executed
>
> Test execution summary:
> total time: 2.4852s
> total number of events: 10000
> total time taken by event execution: 0.1325
> per-request statistics:
> min: 0.01ms
> avg: 0.01ms
> max: 0.06ms
> approx. 95 percentile: 0.01ms
>
> Threads fairness:
> events (avg/stddev): 10000.0000/0.00
> execution time (avg/stddev): 0.1325/0.00
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2009-04-04 02:26:49 Re: Raid 10 chunksize
Previous Message Josh Berkus 2009-04-04 00:09:53 Re: Using IOZone to simulate DB access patterns