Re: Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-06-22 09:10:45
Message-ID: CAA4eK1L1K9CokG6OjZwxoXdR9AgKty7J1mOgJa9uV7ghmebyQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 21, 2016 at 9:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, May 10, 2016 at 8:09 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
>
> > Once the split operation has set the split-in-progress flag, it will
begin scanning bucket (N+1)/2. Every time it finds a tuple that properly
belongs in bucket N+1, it will insert the tuple into bucket N+1 with the
moved-by-split flag set. Tuples inserted by anything other than a split
operation will leave this flag clear, and tuples inserted while the split
is in progress will target the same bucket that they would hit if the split
were already complete. Thus, bucket N+1 will end up with a mix of
moved-by-split tuples, coming from bucket (N+1)/2, and unflagged tuples
coming from parallel insertion activity. When the scan of bucket (N+1)/2
is complete, we know that bucket N+1 now contains all the tuples that are
supposed to be there, so we clear the split-in-progress flag on both
buckets. Future scans of both buckets can proceed normally. Split
operation needs to take a cleanup lock on primary bucket to ensure that it
doesn't start if there is any Insertion happening in the bucket. It will
leave the lock on primary bucket, but not pin as it proceeds for next
overflow page. Retaining pin on primary bucket will ensure that vacuum
doesn't start on this bucket till the split is finished.
>
> In the second-to-last sentence, I believe you have reversed the words
> "lock" and "pin".
>

Yes. What, I mean to say is release the lock, but retain the pin on primary
bucket till end of operation.

> > Insertion will happen by scanning the appropriate bucket and needs to
retain pin on primary bucket to ensure that concurrent split doesn't
happen, otherwise split might leave this tuple unaccounted.
>
> What do you mean by "unaccounted"?
>

It means that split might leave this tuple in old bucket even if it can be
moved to new bucket. Consider a case where insertion has to add a tuple on
some intermediate overflow bucket in the bucket chain, if we allow split
when insertion is in progress, split might not move this newly inserted
tuple.

> > Now for deletion of tuples from (N+1/2) bucket, we need to wait for the
completion of any scans that began before we finished populating bucket
N+1, because otherwise we might remove tuples that they're still expecting
to find in bucket (N+1)/2. The scan will always maintain a pin on primary
bucket and Vacuum can take a buffer cleanup lock (cleanup lock includes
Exclusive lock on bucket and wait till all the pins on buffer becomes zero)
on primary bucket for the buffer. I think we can relax the requirement for
vacuum to take cleanup lock (instead take Exclusive Lock on buckets where
no split has happened) with the additional flag has_garbage which will be
set on primary bucket, if any tuples have been moved from that bucket,
however I think for squeeze phase (in this phase, we try to move the tuples
from later overflow pages to earlier overflow pages in the bucket and then
if there are any empty overflow pages, then we move them to kind of a free
pool) of vacuum, we need a cleanup lock, otherwise scan results might get
effected.
>
> affected, not effected.
>
> I think this is basically correct, although I don't find it to be as
> clear as I think it could be. It seems very clear that any operation
> which potentially changes the order of tuples in the bucket chain,
> such as the squeeze phase as currently implemented, also needs to
> exclude all concurrent scans. However, I think that it's OK for
> vacuum to remove tuples from a given page with only an exclusive lock
> on that particular page.
>

How can we guarantee that it doesn't remove a tuple that is required by
scan which is started after split-in-progress flag is set?

> Also, I think that when cleaning up after a
> split, an exclusive lock is likewise sufficient to remove tuples from
> a particular page provided that we know that every scan currently in
> progress started after split-in-progress was set.
>

I think this could also have a similar issue as above, unless we have
something which prevents concurrent scans.

>
> (Plain text email is preferred to HTML on this mailing list.)
>

If I turn to Plain text [1], then the signature of my e-mail also changes
to Plain text which don't want. Is there a way, I can retain signature
settings in Rich Text and mail content as Plain Text.

[1] -
http://www.mail-signatures.com/articles/how-to-add-or-change-an-email-signature-in-gmailgoogle-apps/

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-06-22 09:14:05 Re: Hash Indexes
Previous Message Etsuro Fujita 2016-06-22 08:56:05 Re: Postgres_fdw join pushdown - wrong results with whole-row reference