Re: Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-10-11 05:34:47
Message-ID: CAA4eK1KFRnH8prnyzTWyO8X-ihGay14jBwPWN464c_idxVuTDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 10, 2016 at 10:07 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, Oct 10, 2016 at 5:55 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Thu, Sep 29, 2016 at 8:27 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Thu, Sep 29, 2016 at 6:04 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> > wrote:
>> >
>> >> Another thing I don't quite understand about this algorithm is that in
>> >> order to conditionally lock the target primary bucket page, we'd first
>> >> need to read and pin it. And that doesn't seem like a good thing to
>> >> do while we're holding a shared content lock on the metapage, because
>> >> of the principle that we don't want to hold content locks across I/O.
>> >>
>> >
>>
>> Aren't we already doing this during BufferAlloc() when the buffer
>> selected by StrategyGetBuffer() is dirty?
>
>
> Right, you probably shouldn't allocate another buffer while holding a
> content lock on a different one, if you can help it.
>

I don't see the problem in that, but I guess the simple rule is that
we should not hold content locks for longer duration, which could
happen if we do I/O, or need to allocate a new buffer.

> But, BufferAlloc
> doesn't do that internally, does it?
>

You are right that BufferAlloc() doesn't allocate a new buffer while
holding content lock on another buffer, but it does perform I/O while
holding content lock.

> It is only a problem if you make it be
> one by the way you use it. Am I missing something?
>
>>
>>
>> > I think we can release metapage content lock before reading the buffer.
>> >
>>
>> On thinking about this again, if we release the metapage content lock
>> before reading and pinning the primary bucket page, then we need to
>> take it again to verify if the split has happened during the time we
>> don't have a lock on a metapage. Releasing and again taking content
>> lock on metapage is not
>> good from the performance aspect. Do you have some other idea for this?
>
>
> Doesn't the relcache patch effectively deal wit hthis? If this is a
> sticking point, maybe the relcache patch could be incorporated into this
> one.
>

Yeah, relcache patch would eliminate the need for metapage locking,
but that is not a blocking point. As this patch is mainly to enable
WAL logging, so there is no urgency to incorporate relcache patch,
even if we have to go with an algorithm where we need to take the
metapage lock twice to verify the splits. Having said that, I am
okay, if Robert and or others are also in favour of combining the two
patches (patch in this thread and cache the metapage patch). If we
don't want to hold content lock across another ReadBuffer call, then
another option could be to modify the read algorithm as below:

read the metapage
compute bucket number for target hash key based on metapage contents
read the required block
loop:
acquire shared content lock on metapage
recompute bucket number for target hash key based on metapage contents
if the recomputed block number is not same as the block number we read
release meta page content lock
read the recomputed block number
else
break;
if (we can't get a shared content lock on the target bucket without blocking)
loop:
release meta page content lock
take a shared content lock on the target primary bucket page
take a shared content lock on the metapage
if (previously-computed target bucket has not been split)
break;

The basic change here is that first we compute the target block number
*without* locking metapage and then after locking the metapage, if
both doesn't match, then we need to again read the computed block
number.

Thoughts?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2016-10-11 05:49:08 Re: PL/Python adding support for multi-dimensional arrays
Previous Message Masahiko Sawada 2016-10-11 05:19:08 Re: Autovacuum launcher process launches worker process at high frequency