From: | "Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> |
---|---|
To: | "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> |
Cc: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Load distributed checkpoint |
Date: | 2006-12-11 12:18:26 |
Message-ID: | 039501c71d1e$75e4c5e0$19527c0a@OPERAO |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Hello,
From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
"Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
>> I'm afraid it is difficult for system designers to expect steady
>> throughput/response time, as long as PostgreSQL depends on the
>> flushing of file system cache. How does Oracle provide stable
>> performance?
>> Though I'm not sure, isn't it the key to use O_SYNC so that
write()s
>> transfer data to disk?
>
> AFAIK, other databases use write() and fsync() in combination. They
call
> fsync() immediately after they write buffers in some small batches.
Otherwise,
> they uses asynchronous and direct I/O options. Therefore, dirty
pages in
> kernel buffers are keeped to be low at any time.
Oracle seems to use O_SYNC. I've found a related page in the Oracle
manuals.
--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/win.102/b15688/ap_unix.htm
Direct Writes to Disk
On both UNIX and Windows platforms, bypassing the file system buffer
cache ensures data is written to disk.
On UNIX, Oracle Database uses the O_SYNC flag to bypass the file
system buffer cache. The flag name depends on the UNIX port.
On Windows, Oracle Database bypasses the file system buffer cache
completely.
--------------------------------------------------
As Itagaki-san says, asynchronous+direct I/O provides best
performance, I believe. Oracle supplies the combination as follows:
--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b15658/appc_linux.htm#sthref870
Oracle Database supports kernel asynchronous I/O. This feature is
disabled by default.
By default, the DISK_ASYNCH_IO initialization parameter in the
parameter file is set to TRUE to enable asynchronous I/O on raw
devices. To enable asynchronous I/O on file system files:
Ensure that all Oracle Database files are located on file systems that
support asynchronous I/O.
Set the FILESYSTEMIO_OPTIONS initialization parameter in the parameter
file to one of the following values:
Linux Distribution Recommended Value
SUSE Linux Enterprise Server 9 SETALL
Other distributions ASYNCH
--------------------------------------------------
I believe SQL Server also uses direct+asynchronous I/O, because
Microsoft recommends in MSDN that the combination plus appropriate
multi-threading provides best performance.
I tested Oracle9i on RHEL 2.1. I straced DBWn (database writer, which
is like the bgwriter of PostgreSQL) while creating tables, indexes,
etc. and shutting down the database server. Oracle surely uses the
O_SYNC as follows, but it doesn't use fsync().
24462 open("/work4/ora/tuna/users01.dbf", O_RDWR|O_SYNC|O_LARGEFILE) =
16
I wonder how the other big DBMS, IBM DB2, handles this. Is Itagaki-san
referring to DB2?
From | Date | Subject | |
---|---|---|---|
Next Message | Takayuki Tsunakawa | 2006-12-11 12:22:44 | Re: Load distributed checkpoint |
Previous Message | Richard Huxton | 2006-12-11 12:05:37 | Re: EXPLAIN ANALYZE |
From | Date | Subject | |
---|---|---|---|
Next Message | Takayuki Tsunakawa | 2006-12-11 12:22:44 | Re: Load distributed checkpoint |
Previous Message | ITAGAKI Takahiro | 2006-12-11 10:38:16 | Re: Load distributed checkpoint |