Re: Just-in-time Background Writer Patch+Test Results

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Just-in-time Background Writer Patch+Test Results
Date: 2007-09-06 16:27:44
Message-ID: Pine.GSO.4.64.0709061121020.14491@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 6 Sep 2007, Kevin Grittner wrote:

> If you exposed the scan_whole_pool_seconds as a tunable GUC, that would
> allay all of my concerns about this patch. Basically, our problems were
> resolved by getting all dirty buffers out to the OS cache within two
> seconds

Unfortunately it wouldn't make my concerns about your system go away or
I'd have recommended exposing it specifically to address your situation.
I have been staring carefully at your configuration recently, and I would
wager that you could turn off the LRU writer altogether and still meet
your requirements in 8.2. Here's what you've got right now:

> shared_buffers = 160MB (=20000 buffers)
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 200
> bgwriter_all_percent = 10.0
> bgwriter_all_maxpages = 600

With the default delay of 200ms, this has the LRU-writer scanning the
whole pool every 1 second, while the all-writer scans every two
seconds--assuming they don't hit the write limits. If some event were to
dirty the whole pool in 200ms, it might take as much as 6.7 seconds to
write everything out (20000 / 600 * 200 ms) via the all-scan. The
all-scan is already gone in 8.3. Your LRU scan will take much longer than
that to clear everything out. At least (20000 / 200 * 200ms) 20 seconds
to clear a fully dirty cache.

But in fact, it's impossible to even bound how long it will take before
the LRU writer (which is the only part this new patch tries to improve)
gets around to writing even a single dirty buffer no matter what
bgwriter_lru_percent (8.2) or scan_whole_pool_seconds (JIT patch) is set
to.

There's a second low-level issue involved here. When a page becomes
dirty, that implies it was also recently used, which means the LRU writer
won't touch it. That page can't be written out by the LRU writer until an
entire pass has been made over the shared_buffer pool while looking for
buffers to allocate for new activity. When the allocation clock-sweep
passes over the newly dirtied buffer again, its usage count will drop by
one and it will no longer be considered recently used. At that point the
LRU writer can write it out. So unless there is other allocation activity
going on, the scan_whole_pool_seconds mechanism will never provide the
bound on time to scan and write everything you hope it will.

And if there's other allocations going on, the much more powerful JIT
mechanism will scan the whole pool plenty fast if you bump the already
exposed multiplier tunable up. In my tests where the buffer cache was
filled with mostly dirty buffers that couldn't be re-used (something
relatively easy to trigger with pgbench tests), I've actually watched the
new code scan >90% of the buffer cache looking for those few reusable
buffers in the pool in a single invocation. This would be like setting
bgwriter_lru_percent=90.0 in the old configuration, but it only gets that
aggressive when the distribution of pages in the buffer cache demands it,
and when it has reason to believe going that fast will be helpful.

The completely understandable line of thinking that led to your request
here is one of my concerns with exposing scan_whole_pool_seconds as a
tunable. It may suggest to people that if they set the number very low,
it will assure all dirty buffers will be scanned and written within that
time bound. That's certainly not the case; both the maxpages and the
usage count information will actually drive the speed that mechanism plods
through the buffer cache. It really isn't useful for scanning fast.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2007-09-06 17:07:24 Re: Hash index todo list item
Previous Message Mark Mielke 2007-09-06 15:53:45 Re: Hash index todo list item