Re: Experimental patch for inter-page delay in VACUUM

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ang Chin Han <angch(at)bytecraft(dot)com(dot)my>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-10 14:23:37
Message-ID: 200311101423.hAAENbv10754@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan Wieck wrote:
> Bruce Momjian wrote:
> > I would be interested to know if you have the background write process
> > writing old dirty buffers to kernel buffers continually if the sync()
> > load is diminished. What this does is to push more dirty buffers into
> > the kernel cache in hopes the OS will write those buffers on its own
> > before the checkpoint does its write/sync work. This might allow us to
> > reduce sync() load while preventing the need for O_SYNC/fsync().
>
> I tried that first. Linux 2.4 does not, as long as you don't tell it by
> reducing the dirty data block aging time with update(8). So you have to
> force it to utilize the write bandwidth in the meantime. For that you
> have to call sync() or fsync() on something.
>
> Maybe O_SYNC is not as bad an option as it seems. In my patch, the
> checkpointer flushes the buffers in LRU order, meaning it flushes the
> least recently used ones first. This has the side effect that buffers
> returned for replacement (on a cache miss, when the backend needs to
> read the block) are most likely to be flushed/clean. So it reduces the
> write load of backends and thus the probability that a backend is ever
> blocked waiting on an O_SYNC'd write().
>
> I will add some counters and gather some statistics how often the
> backend in comparision to the checkpointer calls write().

OK, new idea. How about if you write() the buffers, mark them as clean
and unlock them, then issue fsync(). The advantage here is that we can
allow the buffer to be reused while we wait for the fsync to complete.
Obviously, O_SYNC is not going to allow that. Another idea --- if
fsync() is slow because it can't find the dirty buffers, use write() to
write the buffers, copy the buffer to local memory, mark it as clean,
then open the file with O_SYNC and write it again. Of course, I am just
throwing out ideas here. The big thing I am concerned about is that
reusing buffers not take too long.

> > Perhaps sync() is bad partly because the checkpoint runs through all the
> > dirty shared buffers and writes them all to the kernel and then issues
> > sync() almost guaranteeing a flood of writes to the disk. This method
> > would find fewer dirty buffers in the shared buffer cache, and therefore
> > fewer kernel writes needed by sync().
>
> I don't understand this? How would what method reduce the number of page
> buffers the backends modify?

What I was saying is that if we only write() just before a checkpoint,
we never give the kernel a chance to write the buffers on its own. I
figured if we wrote them earlier, the kernel might write them for us and
sync wouldn't need to do it.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-11-10 14:25:49 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Tom Lane 2003-11-10 14:19:24 Re: what could cause this PANIC on enterprise 7.3.4 db?