Re: [Testperf-general] BufferSync and bgwriter

From: Neil Conway <neilc(at)samurai(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Mark Wong <markw(at)osdl(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, testperf-general(at)pgfoundry(dot)org
Subject: Re: [Testperf-general] BufferSync and bgwriter
Date: 2004-12-13 02:43:33
Message-ID: 1102905813.23208.32.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2004-12-12 at 22:08 +0000, Simon Riggs wrote:
> > On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> > Is the plan to make bgwriter_percent = 100 the default setting?
>
> Hmm...must confess that my only plan is:
> i) discover dynamic behaviour of bgwriter
> ii) fix any bugs or wierdness as quickly as possible
> iii) try to find a way to set the bgwriter defaults

I was just curious why you were bothering to special-case
bgwriter_percent = 100 if it's not going to be the default setting (in
which case I would be surprised if more than 1 in 10 users would take
advantage of the patch).

> Right now, bgwriter_delay
> is useless because the O(N) behaviour makes it impossible to set any
> lower when you have a large shared_buffers.

BTW, I wouldn't be _too_ worried about O(N) behavior, except that we do
this scan while holding the BufMgrLock, which is a well known source of
contention. So reducing the time we hold that lock would be good.

> Your question has made me rethink the exact objective of the bgwriter's
> actions: The way it is coded now the bgwriter looks for dirty blocks, no
> matter where they are in the list.

Not sure what you mean. StrategyDirtyBufferList() returns the specified
number of dirty buffers in order, starting with the T1/T2 LRUs and going
back to the MRUs of both lists. bgwriter_percent effectively ignores
some portion of the tail of that list, so we end up just flushing the
buffers closest to the L1/L2 LRUs. How is this different from what
you're describing?

> bgwriter_percent would be the % of shared_buffers that are searched
> (from the LRU end) to see if they contain dirty buffers, which are
> then written to disk.

By definition, buffers closest to the LRU end of the lists are not
frequently accessed. If we only search the N% of the lists closest to
LRU, we will probably end up flushing just those pages to disk -- and
then not flushing anything else to disk in the subsequent bgwriter calls
because all the buffers close to the LRU will be non-dirty. That's okay
if all we're concerned about is avoiding write() by a real backend, but
we also want to smooth out checkpoint load, which I don't think this
approach would do well.

I suggest just getting rid of bgwriter_percent: AFAICS bgwriter_maxpages
is all the tuning we need, and I think "max # of pages to write" is a
simpler and more logical tuning knob than "% of the buffer pool to scan
looking for dirty buffers." So at each bufmgr invocation, we pick the at
most bgwriter_maxpages dirty pages from the pool, using the pages
closest to the LRUs of T1 and T2. I'd be happy to supply a patch to
implement that if you think it sounds okay.

-Neil

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-12-13 03:07:09 Re: Status of server side Large Object support?
Previous Message Bruce Momjian 2004-12-13 02:31:28 Re: somebody working on: Prevent default re-use of sysids