From: | Marco Colombo <pgsql(at)esiway(dot)net> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org >> Postgres general mailing list" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Maximum transaction rate |
Date: | 2009-03-18 23:49:52 |
Message-ID: | 49C188A0.5050409@esiway.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Martijn van Oosterhout wrote:
> Generally PG uses O_SYNC on open, so it's only one system call, not
> two. And the file it's writing to is generally preallocated (not
> always though).
It has to wait for I/O completion on write(), then, it has to go to
sleep. If two different processes do a write(), you don't know which
will be awakened first. Preallocation don't mean much here, since with
O_SYNC you expect a physical write to be done (with the whole sleep/
HW interrupt/SW interrupt/awake dance). It's true that you may expect
the writes to be carried out in order, and that might be enough. I'm
not sure tho.
>> Well, that's highly dependant on your expectations :) I don't expect
>> a fsync to trigger a journal commit, if metadata hasn't changed. That's
>> obviuosly true for metadata-only journals (like most of them, with
>> notable exceptions of ext3 in data=journal mode).
>
> Really the only thing needed is that the WAL entry reaches disk before
> the actual data does. AIUI as long as you have that the situation is
> recoverable. Given that the actual data probably won't be written for a
> while it'd need to go pretty wonky before you see an issue.
You're giveing up Durability here. In a closed system, that doesn't mean
much, but when you report "payment accepted" to third parties, you can't
forget about it later. The requirement you stated is for Consistency only.
That's what a journaled FS cares about, i.e. no need for fsck (internal
consistency checks) after a crash. It may be acceptable for a remote
standby backup, you replay as much of the WAL as it's available after
the crash (the part you managed to copy, that is). But you know there
can be lost transactions.
It may be acceptable or not. Sometimes it's not. Sometimes you must be
sure the data in on platters before you report "committed". Sometimes
when you say "fsync!" you mean "i want data flushed to disk NOW, and I
really mean it!". :)
.TM.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2009-03-19 00:37:00 | Re: ORDER BY: lexicographic ordering of names |
Previous Message | Greg Smith | 2009-03-18 23:00:28 | Re: Maximum transaction rate |