From: | Jim Nasby <jim(at)nasby(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Should I implement DROP INDEX CONCURRENTLY? |
Date: | 2012-01-04 00:03:17 |
Message-ID: | CA994BD8-9A07-40EB-A3CA-7C4A0A0B0E67@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
>
>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
>
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file. It's
> important that we kill the buffers immediately during relation drop.
>
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist. Simon is after a different solution involving getting
> rid of the clock sweep, but he has failed to explain how that's not
> going to end up being the same type of contention-prone coding that we
> got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
> takes a lot of spinlocks, but that only matters if there is contention
> for them, and the sweep approach avoids the need for a centralized data
> structure.
Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
Heh, it occurs to me that the SQL analogy for how things work right now is that backends currently have to run a SeqScan (or 5) to find a free page... what we need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.
> (BTW, do we have a separate clock sweep hand for each backend? If not,
> there might be some low hanging fruit there.)
No... having multiple clock hands is an interesting idea, but I'm worried that it could potentially get us into trouble if scores of backends were suddenly decrementing usage counts all over the place. For example, what if 5 backends all had their hands in basically the same place, all pointing at a very heavily used buffer. All 5 backends go for free space, they each grab the spinlock on that buffer in succession and suddenly this highly used buffer that started with a count of 5 has now been freed. We could potentially use more than one hand, but I think the relation between the number of hands and the maximum usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2012-01-04 00:06:39 | Re: information schema/aclexplode doesn't know about default privileges |
Previous Message | Tom Lane | 2012-01-03 23:43:52 | Re: [patch] Improve documentation around FreeBSD Kernel Tuning |