From: | mark(at)mark(dot)mielke(dot)cc |
---|---|
To: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Postgresql Performance on an HP DL385 and |
Date: | 2006-08-15 19:39:51 |
Message-ID: | 20060815193951.GA13695@mark.mielke.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.
Yes, many people do. :-)
> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.
No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.
If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?
> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.
This is also wrong. fsck is needed because the file system is broken.
It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.
The probability is low, for sure - but then, the chance of a file system
failure is already low.
Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.
Telling me that journalling is misunderstood doesn't prove to me that you
understand it.
I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)
Cheers,
mark
--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
From | Date | Subject | |
---|---|---|---|
Next Message | mark | 2006-08-15 19:42:59 | Re: Postgresql Performance on an HP DL385 and |
Previous Message | Jim Nasby | 2006-08-15 19:27:49 | Re: Inner Join of the same table |