From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Sean Chittenden <sean(at)chittenden(dot)org> |
Cc: | "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: O_DIRECT in freebsd |
Date: | 2003-06-22 23:50:48 |
Message-ID: | 200306222350.h5MNomr03736@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Basically, we don't know when we read a buffer whether this is a
read-only or read/write. In fact, we could read it in, and another
backend could write it for us.
The big issue is that when we do a write, we don't wait for it to get to
disk.
It seems to use O_DIRECT, we would have to read the buffer in a special
way to mark it as read-only, which seems kind of strange. I see no
reason we can't use free-behind in the PostgreSQL buffer cache to handle
most of the benefits of O_DIRECT, without the read-only buffer restriction.
---------------------------------------------------------------------------
Sean Chittenden wrote:
> > _That_ is an excellent point. However, do we know at the time we
> > open the file descriptor if we will be doing this?
>
> Doesn't matter, it's an option to fcntl().
>
> > What about cache coherency problems with other backends not opening
> > with O_DIRECT?
>
> That's a problem for the kernel VM, if you mean cache coherency in the
> VM. If you mean inside of the backend, that could be a stickier
> issue, I think. I don't know enough of the internals yet to know if
> this is a problem or not, but you're right, it's certainly something
> to consider. Is the cache a write behind cache or is it a read
> through cache? If it's a read through cache, which I think it is,
> then the backend would have to dirty all cache entries pertaining to
> the relations being opened with O_DIRECT. The use case for that
> being:
>
> 1) a transaction begins
> 2) a few rows out of the huge table are read
> 3) a huge query is performed that triggers the use of O_DIRECT
> 4) the rows selected in step 2 are updated (this step should poison or
> update the cache, actually, and act as a write through cache if the
> data is in the cache)
> 5) the same few rows are read in again
> 6) transaction is committed
>
> Provided the cache is poisoned or updated in step 4, I can't see how
> or where this would be a problem. Please enlighten if there's a
> different case that would need to be taken into account. I can't
> imagine ever wanting to write out data using O_DIRECT and think that
> it's a read only optimization in an attempt to minimize the turn over
> in the OS's cache. From fcntl(2):
>
> O_DIRECT Minimize or eliminate the cache effects of reading and writ-
> ing. The system will attempt to avoid caching the data you
> read or write. If it cannot avoid caching the data, it will
> minimize the impact the data has on the cache. Use of this
> flag can drastically reduce performance if not used with
> care.
>
>
> > And finally, how do we deal with the fact that writes to O_DIRECT
> > files will wait until the data hits the disk because there is no
> > kernel buffer cache?
>
> Well, two things.
>
> 1) O_DIRECT should never be used on writes... I can't think of a case
> where you'd want it off, even when COPY'ing data and restoring a
> DB, it just doesn't make sense to use it. The write buffer is
> emptied as soon as the pages hit the disk unless something is
> reading those bits, but I'd imagine the write buffer would be used
> to make sure that as much writing is done to the platter in a
> single write by the OS as possible, circumventing that would be
> insane (though useful possibly for embedded devices with low RAM,
> solid state drives, or some super nice EMC fiber channel storage
> device that basically has its own huge disk cache).
>
> 2) Last I checked PostgreSQL wasn't a threaded app and doesn't use
> non-blocking IO. The backend would block until the call returns,
> where's the problem? :)
>
> If anything O_DIRECT would shake out any bugs in PostgreSQL's caching
> code, if there are any. -sc
>
> --
> Sean Chittenden
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Sean Chittenden | 2003-06-23 00:31:29 | Re: O_DIRECT in freebsd |
Previous Message | The Hermit Hacker | 2003-06-22 23:22:50 | Re: Two weeks to feature freeze |