Re: Design notes for BufMgrLock rewrite

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Design notes for BufMgrLock rewrite
Date: 2005-02-16 17:20:09
Message-ID: 20050216172009.GR52357@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 13, 2005 at 06:56:47PM -0500, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> One thing I realized quickly is that there is no natural way in a clock
> >> algorithm to discourage VACUUM from blowing out the cache. I came up
> >> with a slightly ugly idea that's described below. Can anyone do better?
>
> > Uh, is the clock algorithm also sequential-scan proof? Is that
> > something that needs to be done too?
>
> If you can think of a way. I don't see any way to make the algorithm
> itself scan-proof, but if we modified the bufmgr API to tell ReadBuffer
> (or better ReleaseBuffer) that a request came from a seqscan, we could
> do the same thing as for VACUUM. Whether that's good enough isn't
> clear --- for one thing it would kick up the contention for the
> BufFreelistLock, and for another it might mean *too* short a lifetime
> for blocks fetched by seqscan.

Is there anything (in the buffer headers?) that keeps track of buffer
access frequency? *BSD uses a mechanism to track roughly how often a page
in memory has been accessed, and uses that to determine what pages to
free. In 4.3BSD, a simple 2 hand clock sweep is used; the first hand
sets a not-used bit in each page, the second hand (which sweeps a fixed
distance behind the 1st hand) checks this bit and if it's still clear
moves the page either to the inactive list if it's dirty, or to the
cache list if it's clean. There is also a free list, which is generally
fed by the cache and inactive lists.

Postgresql has a big advantage over an OS though, in that it can
tolerate much more overhead in buffer access code than an OS can in it's
vm system. If I understand correctly, any use of a buffer at all means a
lock needs to be aquired on it's buffer header. As part of this access,
a counter could be incremented with very little additional cost. A
background process would then sweep through 'active' buffers,
decrementing this counter by some amount. Any buffer that was
decremented below 0 would be considered inactive, and a candidate for
being freed. The advantage of using a counter instead of a simple active
bit is that buffers that are (or have been) used heavily will be able to
go through several sweeps of the clock before being freed. Infrequently
used buffers (such as those from a vacuum or seq. scan), would get
marked as inactive the first time they were hit by the clock hand.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message pgsql 2005-02-16 17:25:44 Re: Help me recovering data
Previous Message Stephan Szabo 2005-02-16 17:20:03 Re: Help me recovering data