| From: | James Mansion <james(at)mansionfamily(dot)plus(dot)com> | 
|---|---|
| To: | Greg Smith <gsmith(at)gregsmith(dot)com> | 
| Cc: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Re: POSIX file updates | 
| Date: | 2008-04-02 19:10:29 | 
| Message-ID: | 47F3DA25.7050705@mansionfamily.plus.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
Greg Smith wrote:
> "After a write() to a regular file has successfully returned, any 
> successful read() from each byte position in the file that was 
> modified by that write() will return the data that was written by the 
> write()...a similar requirement applies to multiple write operations 
> to the same file position"
>
Yes, but that doesn't say anything about simultaneous read and write 
from multiple threads from
the same or different processes with descriptors on the same file.
No matter, I was thinking about a case with direct unbuffered IO.  Too 
many years
using Sybase on raw devices. :-(
Though, some of the performance studies relating to UFS directio suggest 
that there
are indeed benefits to managing the write through rather than using the 
OS as a poor
man's background thread to do it. SQLServer allows config based on deadline
scheduling for checkpoint completion I believe. This seems to me a very 
desirable
feature, but it does need more active scheduling of the write-back.
>
> It's clear that such relaxation has benefits with some of Oracle's 
> mechanisms as described.  But amusingly, PostgreSQL doesn't even 
> support Solaris's direct I/O method right now unless you override the 
> filesystem mounting options, so you end up needing to split it out and 
> hack at that level regardless.
Indeed that's a shame. Why doesn't it use the directio?
> PostgreSQL writes transactions to the WAL.  When they have reached 
> disk, confirmed by a successful f[data]sync or a completed syncronous 
> write, that transactions is now committed.  Eventually the impacted 
> items in the buffer cache will be written as well.  At checkpoint 
> time, things are reconciled such that all dirty buffers at that point 
> have been written, and now f[data]sync is called on each touched file 
> to make sure those changes have made it to disk.
Yes but fsync and stable on disk isn't the same thing if there is a 
cache anywhere is it?
Hence the fuss a while back about Apple's control of disk caches. 
Solaris and Windows
do it too.
Isn't allowing the OS to accumulate an arbitrary number of dirty blocks 
without
control of the rate at which they spill to media just exposing a 
possibility of an IO
storm when it comes to checkpoint time?  Does bgwriter attempt to 
control this
with intermediate fsync (and push to media if available)?
It strikes me as odd that fsync_writethrough isn't the most preferred 
option where
it is implemented. The postgres approach of *requiring* that there be no 
cache
below the OS is problematic, especially since the battery backup on internal
array controllers is hardly the handiest solution when you find the mobo 
has died.
And especially when the inability to flush caches on modern SATA and SAS
drives would appear to be more a failing in some operating systems than in
the drives themselves..
The links I've been accumulating into my bibliography include:
http://www.h2database.com/html/advanced.html#transaction_isolation
http://lwn.net/Articles/270891/
http://article.gmane.org/gmane.linux.kernel/646040
http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html
http://brad.livejournal.com/2116715.html
And your handy document on wal tuning, of course.
James
| From | Date | Subject | |
|---|---|---|---|
| Next Message | James Mansion | 2008-04-02 19:19:54 | Re: Performance Implications of Using Exceptions | 
| Previous Message | Hell, Robert | 2008-04-02 10:11:06 | Re: Cursors and different settings for default_statistics_target |