Quick Links

Re: Bug: Buffer cache is not scan resistant

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>
Subject:	Re: Bug: Buffer cache is not scan resistant
Date:	2007-03-06 18:29:17
Message-ID:	45EDB2FD.4070705@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Davis wrote:
> On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
>> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
>>> Another approach I proposed back in December is to not have a
>>> variable like that at all, but scan the buffer cache for pages
>>> belonging to the table you're scanning to initialize the scan.
>>> Scanning all the BufferDescs is a fairly CPU and lock heavy
>>> operation, but it might be ok given that we're talking about large
>>> I/O bound sequential scans. It would require no DBA tuning and
>>> would work more robustly in varying conditions. I'm not sure where
>>> you would continue after scanning the in-cache pages. At the
>>> highest in-cache block number, perhaps.
>> If there was some way to do that, it'd be what I'd vote for.
>>
>
> I still don't know how to make this take advantage of the OS buffer
> cache.

Yep, I don't see any way to do that. I think we could live with that,
though. If we went with the sync_scan_offset approach, you'd have to
leave a lot of safety margin in that as well.

> However, no DBA tuning is a huge advantage, I agree with that.
>
> If I were to implement this idea, I think Heikki's bitmap of pages
> already read is the way to go. Can you guys give me some pointers about
> how to walk through the shared buffers, reading the pages that I need,
> while being sure not to read a page that's been evicted, and also not
> potentially causing a performance regression somewhere else?

You could take a look at BufferSync, for example. It walks through the
buffer cache, syncing all dirty buffers.

FWIW, I've attached a function I wrote some time ago when I was playing
with the same idea for vacuums. A call to the new function loops through
the buffer cache and returns the next buffer that belong to a certain
relation. I'm not sure that it's correct and safe, and there's not much
comments, but should work if you want to play with it...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
readanybuffer.patch	text/x-patch	1.4 KB

In response to

Re: Bug: Buffer cache is not scan resistant at 2007-03-06 17:56:03 from Jeff Davis

Responses

Re: Bug: Buffer cache is not scan resistant at 2007-03-07 02:28:30 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2007-03-06 18:35:19	Re: Auto creation of Partitions
Previous Message	Florian G. Pflug	2007-03-06 18:27:03	Re: Auto creation of Partitions