From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash Indexes |
Date: | 2016-10-05 16:52:51 |
Message-ID: | CA+TgmoZ64uTdvH8fe8LLo14qQUc8byruVALfgFfOc15S3wzE8A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think one way to avoid the risk of deadlock in above scenario is to
> take the cleanup lock conditionally, if we get the cleanup lock then
> we will delete the items as we are doing in patch now, else it will
> just mark the tuples as dead and ensure that it won't try to remove
> tuples that are moved-by-split. Now, I think the question is how will
> these dead tuples be removed. We anyway need a separate mechanism to
> clear dead tuples for hash indexes as during scans we are marking the
> tuples as dead if corresponding tuple in heap is dead which are not
> removed later. This is already taken care in btree code via
> kill_prior_tuple optimization. So I think clearing of dead tuples can
> be handled by a separate patch.
That seems like it could work. The hash scan code will need to be
made smart enough to ignore any tuples marked dead, if it isn't
already. More aggressive cleanup can be left for another patch.
> I have also thought about using page-scan-at-a-time idea which has
> been discussed upthread[1], but I think we can't completely eliminate
> the need to out-wait scans (cleanup lock requirement) for scans that
> are started when split-in-progress or for non-MVCC scans as described
> in that e-mail [1]. We might be able to find some way to solve the
> problem with this approach, but I think it will be slightly
> complicated and much more work is required as compare to previous
> approach.
There are several levels of aggressiveness here with different locking
requirements:
1. Mark line items dead without reorganizing the page. Needs an
exclusive content lock, no more. Even a shared content lock may be
OK, as for other opportunistic bit-flipping.
2. Mark line items dead and compact the tuple data. If a pin is
sufficient to look at tuple data, as it is for the heap, then a
cleanup lock is required here. But if we always hold a shared content
lock when looking at the tuple data, it might be possible to do this
with just an exclusive content lock.
3. Remove dead line items completely, compacting the tuple data and
the item-pointer array. Doing this with only an exclusive content
lock certainly needs page-at-a-time mode because otherwise a searcher
that resumes a scan later might resume from the wrong place. It also
needs the guarantee mentioned for point #2, namely that nobody will be
examining the tuple data without a shared content lock.
4. Squeezing the bucket. This is probably always going to require a
cleanup lock, because otherwise it's pretty unclear how a concurrent
scan could be made safe. I suppose the scan could remember every TID
it has seen, somehow detect that a squeeze had happened, and rescan
the whole bucket ignoring TIDs already returned, but that seems to
require the client to use potentially unbounded amounts of memory to
remember already-returned TIDs, plus an as-yet-uninvented mechanism
for detecting that a squeeze has happened. So this seems like a
dead-end to me.
I think that it is very much worthwhile to reduce the required lock
strength from cleanup-lock to exclusive-lock in as many cases as
possible, but I don't think it will be possible to completely
eliminate the need to take the cleanup lock in some cases. However,
if we can always take the cleanup lock conditionally and never be in a
situation where it's absolutely required, we should be OK - and even
level (1) gives you that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-10-05 16:55:37 | Re: WIP: Secure Transport support as OpenSSL alternative on macOS |
Previous Message | Fabien COELHO | 2016-10-05 16:48:18 | Re: pgbench more operators & functions |