Re: Fwd: Is the fsync() fake on FreeBSD6.1?

From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Is the fsync() fake on FreeBSD6.1?
Date: 2006-09-23 20:08:14
Message-ID: slrnehb51e.2ea3.andrew+nonews@atlantis.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2006-09-23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
>> Whether the underlying device lies about the write completion is another
>> matter. All current SCSI disks have WCE enabled by default, which means
>> that they will lie about write completion if FUA was not set in the
>> request, which FreeBSD never sets.
>
> Huh? The entire point of the SCSI command set is that it's not
> necessary to lie about write completion for performance reasons, because
> the architecture has always supported the concept of multiple requests
> in-flight concurrently.

I seem to recall we've had this conversation previously.

> Has the disk drive industry gotten a whole lot
> stupider in the fifteen years since I last wrote a SCSI driver?

Quite possibly, yes.

I certainly would never claim that WCE is a good idea, or that having it
enabled by default is a good idea, I merely report the _fact_ that it is
indeed enabled by default on every SCSI drive that I have recently
encountered (over several different vendors).

On my database machines I am careful to disable it (and check that this
does indeed take effect). I would recommend that others do likewise. The
performance impact of disabling WCE is not serious (other than removing
the unsafe speed gains of course).

Since posting the previous response I've been directed to a document that
seems to imply that Linux drivers now attempt to handle write-order
guarantees by introducing the concept of a "write barrier", i.e. a write
request which must complete after all previous writes and before all
subsequent ones. Achieving this requires different strategies depending
on whether the underlying device allows command-queueing and/or exposes a
useful cache flush command; the implication of this is that for SCSI disks
with WCE, the linux driver will actually send SYNCHRONIZE CACHE when doing
a write barrier (which could be expensive of course). If (and I have no
idea if this is true) fsync() is implemented by means of such a barrier,
then this implies that an fsync()-heavy workload will perform much worse
on Linux when WCE is enabled than when it is disabled, since in the latter
case the driver will not issue SYNCHRONIZE CACHE and will simply ensure
that the relevent writes are all completed.

It would be interesting to see benchmarks of this.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2006-09-23 20:12:29 Buildfarm alarms
Previous Message Martijn van Oosterhout 2006-09-23 19:56:04 Re: pgsql: We're going to have to spell dotless i