From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Use of O_DIRECT only for open_* sync options |
Date: | 2011-01-25 00:19:45 |
Message-ID: | 201101250019.p0P0Jj900973@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Smith wrote:
> Bruce Momjian wrote:
> > xlogdefs.h says:
> >
> > /*
> > * Because O_DIRECT bypasses the kernel buffers, and because we never
> > * read those buffers except during crash recovery, it is a win to use
> > * it in all cases where we sync on each write(). We could allow O_DIRECT
> > * with fsync(), but because skipping the kernel buffer forces writes out
> > * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
> > * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
> > * Also, O_DIRECT is never enough to force data to the drives, it merely
> > * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
> > */
> >
> > This seems wrong because fsync() can win if there are two writes before
> > the sync call. Can kernels not issue fsync() if the write was O_DIRECT?
> > If that is the cause, we should document it.
> >
>
> The comment does look busted, because you did imagine exactly a case
> where they might be combined. The only incompatibility that I'm aware
> of is that O_DIRECT requires reads and writes to be aligned properly, so
> you can't use it in random application code unless it's aware of that.
> O_DIRECT and fsync are compatible; for example, MySQL allows combining
> the two: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html
>
> (That whole bit of documentation around innodb_flush_method includes
> some very interesting observations around O_DIRECT actually)
>
> I'm starting to consider the idea that much of the performance gains
> seen on earlier systems with O_DIRECT was because it decreased CPU usage
> shuffling things into the OS cache, rather than its impact on avoiding
> pollution of said cache. On Linux for example, its main accomplishment
> is decribed like this: "File I/O is done directly to/from user space
> buffers."
> http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html The
> earliest paper on the implementation suggests a big decrease in CPU
> overhead from that:
> http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
>
> Impossible to guess whether that's more true ("CPU cache pollution is a
> bigger problem now") or less true ("drives are much slower relative to
> CPUs now") today. I'm trying to remain agnostic and let the benchmarks
> offer an opinion instead.
Agreed. Perhaps we need a separate setting to turn direct I/O on and
off, and decouple wal_sync_method and direct I/O.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2011-01-25 00:43:02 | Re: pg_test_fsync problem |
Previous Message | Robert Haas | 2011-01-25 00:18:47 | Re: ALTER TYPE 3: add facility to identify further no-work cases |