Re: Filesystem options for storing pg_data

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Filesystem options for storing pg_data
Date: 2005-04-21 19:59:29
Message-ID: Pine.LNX.4.61.0504211943220.27506@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 21 Apr 2005, Scott Marlowe wrote:

> References:
>
> http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php
> http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php
> http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php
> http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9
> http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf
> http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html
> http://jamesthornton.com/hotlist/linux-filesystems/
>
> It took me all of about 10 minutes to find all of those. But I've got
> work to do, so I'll leave further research here to the rest of the list.

Thanks for your precious time, but when I say I searched the archives
I really mean it. If you cared to read _my_ message, I was looking for
any benchmark (or comment) under the following conditions:

1) PostgreSQL load - that is, a benchmarck based on PostgreSQL, or,
alternatively, on another database, or on artificial write+fsync load.
Any other (cached) write load is _meaningless_ to our purposes.
2) the author was aware of mount options, and actually used them.
I think there's enough evidence that ext3 default mount options
are on the safe side (_safer_ than other fses, it seems), so there's
no point in comparing default ext3 alone (comparing all modes
_is_ interesting, tho).

I've spend much more than 10 minutes of my time, and found nothing,
but the links that _I_ posted.

I'll invest more time, and comment on the links you posted
(which I had read already, of course):

http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php
it's not clear at all, it possibly fails both 1) and 2). The authors
says nothing about a write+fsync benchmark or about ext3 mount options.

http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php
that's the one I got Bert Scalzo's article from. Other links
fail to meet 1) and some 2). Note that fsync is likely to
disrupt most optimizations. The fact that a filesystem "scales better"
under normal (cached) load, means nothing when it comes to fsyncing.

http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php
this _defends_ ext2 from the accusation of being buggy. The author
prefers XFS, "but I only have fuzzy reasons, as opposed to metrics."
I was looking for metrics. It's says nothing about ext3, so does not
apply.

These are not from postgresql lists, but anyway:

http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9
"People are referring to the old ext2 filesystem here. The new ext3 is very
resistant to this issue."
If you're referring to what "Jinny" said, well all the evidence
is "...recently I have come to know from a reliable group that Linux
is not so stable". Does not meet 1) and 2), sorry.

http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf
Yes, surprisingly enough I've read this one, too. The only interesting
part is "[XFS] Perfomance features include asynchronous write ahead
logging (similar to Ext2 " - no, ext3 - " with data=writeback), ...". This
confirms my comment about comparing apples and oranges, and completely
justifies my requirement 2) - and comes from a XFS paper!
It's not clear at all if what they call OLTP Workload really performs
fsync after write. Anyway, there's only _one_ graph in the results
(how weird) and all filesystems are pretty close. No tests with
data=journal. All other graphs in the Appendix fail requirement 1).

http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html
thanks, this is the like that _I_ posted. Have _you_ read it?
It shows that EXT3 is almost twice as fast as JFS. Too bad there's no
XFS here.
BTW, this meets 1), I'm not sure about 2), but the options they used
seem enough to outperform JFS.

http://jamesthornton.com/hotlist/linux-filesystems/
this is just a collection of links. It's not clear which one would
back up your claim of XFS and JFS being _generally_ considered superior
for PostgreSQL or other database usage.

Let's see:
http://www-106.ibm.com/developerworks/linux/library/l-fs8.html
"data=ordered mode effectively solves the corruption problem found in
data=writeback mode and _most other journaled filesystems_, and it does
so without requiring full data journaling"

(emphasis mine) interesting enough, most journaled filesystems do have
a corruption problem, ext3 in default mode doesn't.
But this does not really apply to us, this refers to normal writes not
write+fsyncs. I think any fs has to be badly broken if it looses data
after fsycn, anyway.

http://www-106.ibm.com/developerworks/library/l-fs9.html
"Other than that, XFS performance was very close to that of ReiserFS and
generally surpasses that of ext3... "

uh, this sounds interesting... but wait...

"... One of the nicest things about XFS is that, like ReiserFS, it doesn't
generate a lot of unnecessary disk activity. XFS tries to cache as much
data in memory as possible, and generally only writes things out to disk
when memory pressure dictates that it do so."

so, if a benchmark shows XFS is faster, it's matter of better caching,
right? But it comes at a price of possible (data) corruption...
Thankfully, it's pretty useless to us, with every write followed by a sync.

I'm sorry, but with the links _you_ selected, applying my filter
conditions 1) and 2), which are necessary for a fair comparison,
one could say there's general consensus on EXT3 being far superior
to other filesystems, not the opposite.

Note that I'm not interested in supporting such a claim. As I already
wrote I think FS selection has generally a minimal impact on PostgreSQL
performance.

But again, what was you original claim
"Generally XFS and JFS are considered superior to ext2/3."
based upon?

I apologize if I sound somehow harsh, it's not really intented.
But next time please assume that:
- I'm able to do a 10 minute search;
- I've got some work to do, too, but I'm willing so spend more than
10 minutes on this research (it already took me more than 2 hours
actually, of my spare time);
- if I say I've searched the lists and read many messages, I've
really done so.

You're absolutely entitled to have your opinion, if you like XFS and
JFS go ahead and use them, because of their name, the names of their
sponsors (IBM and SGI), or their features, or your personal experience,
or whatever. Just please don't claim that's general consensus for the
pgsql lists. There's _no_ general consensus. There's _no_ clear winner.
And if you do want a winner anyway, it's ext3, so far.

This "ext3 is not good as XFS as JFS" is a recurring subject, as
long as "ext3 is buggy". _Every single time_ I've asked for
references to back up such claims, nothing valuable was presented.
On the contrary, the only references I've found are on the
"ext3 is equal or better" side.

Now, feel free to prove me wrong.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve - DND 2005-04-21 20:35:16 timezone() with timeofday() converts the wrong direction?
Previous Message David Wheeler 2005-04-21 19:55:23 Waiting for Disconnect