Re: O_DIRECT in freebsd

From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: O_DIRECT in freebsd
Date: 2003-06-23 00:31:29
Message-ID: 20030623003129.GH97131@perrin.int.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Basically, we don't know when we read a buffer whether this is a
> read-only or read/write. In fact, we could read it in, and another
> backend could write it for us.

Um, wait. The cache is shared between backends? I don't think so,
but it shouldn't matter because there has to be a semaphore locking
the cache to prevent the coherency issue you describe. If PostgreSQL
didn't, it'd be having problems with this now. I'd also think that
MVCC would handle the case of updated data in the cache as that has to
be a common case. At what point is the cached result invalidated and
fetched from the OS?

> The big issue is that when we do a write, we don't wait for it to
> get to disk.

Only in the case when fsync() is turned off, but again, that's up to
the OS to manage that can of worms, which I think BSD takes care of
that. From conf/NOTES:

# Attempt to bypass the buffer cache and put data directly into the
# userland buffer for read operation when O_DIRECT flag is set on the
# file. Both offset and length of the read operation must be
# multiples of the physical media sector size.
#
#options DIRECTIO

The offsets and length bit kinda bothers me though, but I thin that's
stuff that the ernel has to take into account, not the userland calls,
I wonder if that's actually accurate any more or affects userland
calls... seems like that'd be a bit too much work to have the user
do, esp given the lack of documentation on the flag... should be just
drop in additional flag, afaict.

> It seems to use O_DIRECT, we would have to read the buffer in a
> special way to mark it as read-only, which seems kind of strange. I
> see no reason we can't use free-behind in the PostgreSQL buffer
> cache to handle most of the benefits of O_DIRECT, without the
> read-only buffer restriction.

I don't see how this'd be an issue as buffers populated via a read(),
that are updated, and then written out, would occupy a new chunk of
disk to satisfy MVCC. Why would we need to mark a buffer as read only
and carry around/check its state?

-sc

--
Sean Chittenden

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-06-23 00:42:45 Re: O_DIRECT in freebsd
Previous Message Bruce Momjian 2003-06-22 23:50:48 Re: O_DIRECT in freebsd