From: | Marco Colombo <pgsql(at)esiway(dot)net> |
---|---|
To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
Cc: | Christophe <xof(at)thebuild(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Maximum transaction rate |
Date: | 2009-03-14 04:25:11 |
Message-ID: | 49BB31A7.1090604@esiway.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Scott Marlowe wrote:
> On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof(at)thebuild(dot)com> wrote:
>> So, if the software calls fsync, but fsync doesn't actually push the data to
>> the controller, you are still at risk... right?
>
> Ding!
>
I've been doing some googling, now I'm not sure that not supporting barriers
implies not supporting (of lying) at blkdev_issue_flush(). It seems that
it's pretty common (and well-defined) for block devices to report
-EOPNOTSUPP at BIO_RW_BARRIER requests. device mapper apparently falls in
this category.
See:
http://lkml.org/lkml/2007/5/25/71
this is an interesting discussion on barriers and flushing.
It seems to me that PostgreSQL needs both ordered and synchronous
writes, maybe at different times (not that EVERY write must be both ordered
and synchronous).
You can emulate ordered with single+synchronous althought with a price.
You can't emulate synchronous writes with just barriers.
OPTIMAL: write-barrier-write-barrier-write-barrier-flush
SUBOPTIMAL: write-flush-write-flush-write-flush
As I understand it, fsync() should always issue a real flush: it's unrelated
to the barriers issue.
There's no API to issue ordered writes (or barriers) at user level,
AFAIK. (Uhm... O_DIRECT, maybe implies that?)
FS code may internally issue barrier requests to the block device, for
its own purposes (e.g. journal updates), but there's not useland API for
that.
Yet, there's no reference to DM not supporting flush correctly in the
whole thread... actually there are refereces to the opposite. DM devices
are defined as FLUSHABLE.
Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.
.TM.
From | Date | Subject | |
---|---|---|---|
Next Message | Devrim GÜNDÜZ | 2009-03-14 09:02:10 | New shapshot RPMs (Mar 10, 2009) are ready for testing |
Previous Message | Tatsuo Ishii | 2009-03-14 03:32:14 | Re: [Pgpool-general] panic: index siblings mismatch |