Re: Readme of Buffer Management seems to have wrong sentence

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Readme of Buffer Management seems to have wrong sentence
Date: 2012-05-22 15:59:24
Message-ID: CA+TgmobuY=ko51nYniNJrABM_4LtH7J5G5XNMfZ87RH5G3Dw-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 22, 2012 at 10:25 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The idea would be to have a background process (like bgwriter)
>> maintain the global LRU state and push candidate buffers onto the
>> freelist.
>
> Amit was trying to convince me of the same idea at PGCon, but I don't
> buy it.  bgwriter doesn't scan the buffer array nearly fast enough to
> provide useful adjustment of the usage counts under load.  And besides
> if the decrements are decoupled from the allocation requests it's no
> longer obvious that the algorithm is even an approximation of LRU.

Well, bgwriter is *supposed* to anticipate which buffers are about to
be allocated and clean any of those that are dirty. Having it
decrement the usage counts and stuff the resulting list of buffers
into a linked list seems like a pretty reasonable extension of that,
assuming that it works in the first place. If it doesn't, then we
need a rethink.

> But the larger issue here is that if that processing is a bottleneck
> (which I agree it is), how does it help to force a single process to
> be responsible for it?  Any real improvement in scalability here will
> need to decentralize the operation more, not less.

Sure. I think we could have the freelist and the clock sweep
protected by different locks. The background writer would lock out
other people running the clock sweep, but the freelist could be
protected by a spinlock which no one would ever need to take for more
than a few cycles. Right there, you should get a significant
scalability improvement, since the critical section would be so much
shorter than it is now. If that's not enough, you could have several
freelists protected by different spinlocks; the bgwriter would put
1/Nth of the reusable buffers on each freelist, and backends would
pick a freelist at random to pull buffers off of.

> My own thoughts about this had pointed in the direction of getting rid
> of the central freelist entirely, instead letting each backend run its
> own independent clock sweep as needed.  The main problem with that is
> that if there's no longer any globally-visible clock sweep state, it's
> pretty hard to figure out what the control logic for the bgwriter should
> look like.  Maybe it would be all right to have global variables that
> are just statistics counters for allocations and buffers swept over,
> which backends would need to spinlock for just long enough to increment
> the counters at the end of each buffer allocation.

Hmm, that's certainly an interesting idea. I fear that if the clock
sweeps from the different backends ended up too closely synchronized,
you would end up evicting whatever was in the way, be it hot or cold.
It might almost be better to have individual backends choose buffers
to evict at random; if the chosen buffer isn't evictable, we decrement
its usage count and pick another one, also at random.

With respect to the control logic for the background writer, one idea
I had was to get rid of the idea that the background writer's job is
to write in advance of the strategy point. Instead, every time the
clock sweep passes over a dirty buffer that is otherwise evictable, we
add it to a queue of things that the bgwriter should clean. Those
buffers, once cleaned, go on the free list. Maybe some variant of
that could work with your idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Susanne Ebrecht 2012-05-22 16:00:36 Re: Changing the concept of a DATABASE
Previous Message Florian Pflug 2012-05-22 15:57:57 Re: Per-Database Roles