Re: our buffer replacement strategy is kind of lame

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Greg Smith" <greg(at)2ndQuadrant(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2011-08-15 17:06:29
Message-ID: 4E490BC5020000250003FF03@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith <greg(at)2ndQuadrant(dot)com> wrote:

> Anyway, I think every idea thrown out here so far needs about an
> order of magnitude more types of benchmarking test cases before it
> can be evaluated at all.

Right. I'm very excited about all the optimizations going in, and
can't see where the ones committed so far are very likely to cause
problems for workloads other than those tested, but here we're
heading into territory where the performance farm is very
desperately needed.

That said, I'll throw out one more idea, as something that worked
very well for me in a similar situation, and played well with
external caching which knew lots about the storage media but nothing
about the database side. It's very similar to the clock sweep
algorithm, but uses a weighted average rather than a simple count.
It didn't tend to leave as many buffers clustered at one end or the
other, and the gradation did seem to be useful.

I started out this post trying to describe it more generally, but it
all came out sounding too vague and hand-wavy; so I'll give a
description with more particulars which could, of course, be tweaked
without tossing the concept. This particular description modifies
the techniques I've used to try to better fit the particulars of
PostgreSQL. It may fall down on contention issues, but since it
benchmarked better for me than the alternatives, I thought it might
at least help spin off other useful ideas.

(1) Each buffer would have a one byte count, incremented on access,
similar to the current count. Of course, 255 would not wrap on
access, but be left unchanged.

(2) We would have 256 int counters for how many buffers were
available with each access count. These would be maintained when
the access counts changed.

(3) While sweeping, we wouldn't decrement the access counters for
buffers we couldn't use; rather, there would be a boolean flag to
say whether to divide the access counts for non-chosen buffers by
two.

(4) The "reduce" flag would be set when any access count goes to
255, and cleared after one complete sweep through the buffers. (So,
obviously, we note the location when we set the flag.)

(5) To find the next buffer to evict, we would find out what is the
lowest count in use, and sweep forward until we found one.

(6) We would have a background process doing this sweeping and
count reduction which would try to keep a few evicted buffers in a
queue for quick pickup. This queue would be the only source for
getting a buffer. Getting a buffer would always be a very
lightweight operation if there's something in the queue. The
background task would try very hard to keep the queue non-empty,
only coming to rest when the queue was "full" -- whatever that was
chosen to mean (possibly based on the sort of adaptive calculations
currently used by the background writer).

There are a lot of interesting ways this could interact with the
background writer. One of the more intriguing to me would be for
the "sweep" process to estimate what current access count levels
will need to be used on its next sweep and queue up any at those
levels which are dirty for the background writer. This is vaguely
conceptually similar to the "wash marker" used in some LRU lists.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2011-08-15 17:09:33 Re: walprotocol.h vs frontends
Previous Message Magnus Hagander 2011-08-15 16:50:43 Re: walprotocol.h vs frontends