From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash Indexes |
Date: | 2016-06-16 07:28:58 |
Message-ID: | CAA4eK1L=OCH2-Vh1JXLBzktfjHMOkchZ3jSL3cvAXGXc-B9AqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 10, 2016 at 5:39 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
>
>
> Incomplete Splits
> --------------------------
> Incomplete splits can be completed either by vacuum or insert as both
> needs exclusive lock on bucket. If vacuum finds split-in-progress flag on
> a bucket then it will complete the split operation, vacuum won't see this
> flag if actually split is in progress on that bucket as vacuum needs
> cleanup lock and split retains pin till end of operation. To make it work
> for Insert operation, one simple idea could be that if insert finds
> split-in-progress flag, then it releases the current exclusive lock on
> bucket and tries to acquire a cleanup lock on bucket, if it gets cleanup
> lock, then it can complete the split and then the insertion of tuple, else
> it will have a exclusive lock on bucket and just perform the insertion of
> tuple. The disadvantage of trying to complete the split in vacuum is that
> split might require new pages and allocating new pages at time of vacuum is
> not advisable. The disadvantage of doing it at time of Insert is that
> Insert might skip it even if there is some scan on the bucket is going on
> as scan will also retain pin on the bucket, but I think that is not a big
> deal. The actual completion of split can be done in two ways: (a) scan
> the new bucket and build a hash table with all of the TIDs you find
> there. When copying tuples from the old bucket, first probe the hash
> table; if you find a match, just skip that tuple (idea suggested by
> Robert Haas offlist) (b) delete all the tuples that are marked as
> moved_by_split in the new bucket and perform the split operation from the
> beginning using old bucket.
>
>
I have completed the patch with respect to incomplete splits and delayed
cleanup of garbage tuples. For incomplete splits, I have used the option
(a) as mentioned above. The incomplete splits are completed if the
insertion sees split-in-progress flag in a bucket. The second major thing
this new version of patch has achieved is cleanup of garbage tuples i.e the
tuples that are left in old bucket during split. Currently (in HEAD), as
part of a split operation, we clean the tuples from old bucket after moving
them to new bucket, as we have heavy-weight locks on both old and new
bucket till the whole split operation. In the new design, we need to take
cleanup lock on old bucket and exclusive lock on new bucket to perform the
split operation and we don't retain those locks till the end (release the
lock as we move on to overflow buckets). Now to cleanup the tuples we need
a cleanup lock on a bucket which we might not have at split-end. So I
choose to perform the cleanup of garbage tuples during vacuum and when
re-split of the bucket happens as during both the operations, we do hold
cleanup lock. We can extend the cleanup of garbage to other operations as
well if required.
I have done some performance tests with this new version of patch and
results are on same lines as in my previous e-mail. I have done some
functional testing of the patch as well. I think more detailed testing is
required, however it is better to do that once the design is discussed and
agreed upon.
I have improved the code comments to make the new design clear, but still
one can have questions related to locking decisions I have taken in patch.
I think one of the important thing to verify in the patch is locking
strategy used for different operations. I have changed heavy-weight locks
to a light-weight read and write locks and a cleanup lock for vacuum and
split operation.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
concurrent_hash_index_v2.patch | application/octet-stream | 70.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jakob Egger | 2016-06-16 08:39:00 | sslmode=require fallback |
Previous Message | Masahiko Sawada | 2016-06-16 06:33:08 | Re: forcing a rebuild of the visibility map |