From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Heap truncation without AccessExclusiveLock (9.4) |
Date: | 2013-05-20 19:43:05 |
Message-ID: | 519A7CC9.30409@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17.05.2013 12:35, Andres Freund wrote:
> On 2013-05-17 10:45:26 +0300, Heikki Linnakangas wrote:
>> On 16.05.2013 04:15, Andres Freund wrote:
>>> Couldn't we "just" take the extension lock and then walk backwards from
>>> the rechecked end of relation ConditionalLockBufferForCleanup() the
>>> buffers?
>>> For every such locked page we check whether its still empty. If we find
>>> a page that we couldn't lock, isn't empty or we already locked a
>>> sufficient number of pages we truncate.
>>
>> You need an AccessExclusiveLock on the relation to make sure that after you
>> have checked that pages 10-15 are empty, and truncated them away, a backend
>> doesn't come along a few seconds later and try to read page 10 again. There
>> might be an old sequential scan in progress, for example, that thinks that
>> the pages are still there.
>
> But that seems easily enough handled: We know the current page in its
> scan cannot be removed since its pinned. So make
> heapgettup()/heapgetpage() pass something like RBM_IFEXISTS to
> ReadBuffer and if the read fails recheck the length of the relation
> before throwing an error.
Hmm. For the above to work, you'd need to atomically check that the
pages you're truncating away are not pinned, and truncate them. If those
steps are not atomic, a backend might pin a page after you've checked
that it's not pinned, but before you've truncated the underlying file. I
guess that be doable; needs some new infrastructure in the buffer
manager, however.
> There isn't much besides seqscans that can have that behaviour afaics:
> - (bitmap)indexscans et al. won't point to completely empty pages
> - there cannot be a concurrent vacuum since we have the appropriate
> locks
> - if a trigger or something else has a tid referencing a page there need
> to be unremovable tuples on it.
>
> The only thing that I immediately see are tidscans which should be
> handleable in a similar manner to seqscans.
>
> Sure, there are some callsites that need to be adapted but it still
> seems noticeably easier than what you proposed upthread.
Yeah. I'll think some more how the required buffer manager changes could
be done.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2013-05-20 19:44:20 | Re: fast promotion and log_checkpoints |
Previous Message | Heikki Linnakangas | 2013-05-20 19:40:14 | Re: Fast promotion failure |