From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com> |
Subject: | Re: Bug: Buffer cache is not scan resistant |
Date: | 2007-03-06 18:29:17 |
Message-ID: | 45EDB2FD.4070705@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis wrote:
> On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
>> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
>>> Another approach I proposed back in December is to not have a
>>> variable like that at all, but scan the buffer cache for pages
>>> belonging to the table you're scanning to initialize the scan.
>>> Scanning all the BufferDescs is a fairly CPU and lock heavy
>>> operation, but it might be ok given that we're talking about large
>>> I/O bound sequential scans. It would require no DBA tuning and
>>> would work more robustly in varying conditions. I'm not sure where
>>> you would continue after scanning the in-cache pages. At the
>>> highest in-cache block number, perhaps.
>> If there was some way to do that, it'd be what I'd vote for.
>>
>
> I still don't know how to make this take advantage of the OS buffer
> cache.
Yep, I don't see any way to do that. I think we could live with that,
though. If we went with the sync_scan_offset approach, you'd have to
leave a lot of safety margin in that as well.
> However, no DBA tuning is a huge advantage, I agree with that.
>
> If I were to implement this idea, I think Heikki's bitmap of pages
> already read is the way to go. Can you guys give me some pointers about
> how to walk through the shared buffers, reading the pages that I need,
> while being sure not to read a page that's been evicted, and also not
> potentially causing a performance regression somewhere else?
You could take a look at BufferSync, for example. It walks through the
buffer cache, syncing all dirty buffers.
FWIW, I've attached a function I wrote some time ago when I was playing
with the same idea for vacuums. A call to the new function loops through
the buffer cache and returns the next buffer that belong to a certain
relation. I'm not sure that it's correct and safe, and there's not much
comments, but should work if you want to play with it...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
readanybuffer.patch | text/x-patch | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2007-03-06 18:35:19 | Re: Auto creation of Partitions |
Previous Message | Florian G. Pflug | 2007-03-06 18:27:03 | Re: Auto creation of Partitions |