Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Date: 2016-08-20 04:05:31
Message-ID: CAA4eK1JKm=bRZf4WcF9asejEGMi5kQaTHbqHALorVAX3M9w64Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 18, 2016 at 8:24 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>
> A couple of points make me uneasy about this patch, yet I can think of
> no better alternative, so I seek feedback:
>
> - introducing a different format for inner index tuples makes for an
> invasive patch and quite difficult-to-reason code (it's easy to forget
> whether a page is leaf or inner and that now causes assertion failures
> or segfaults)
> - the ascent-descent to check for uniqueness when there are large
> dead tuple runs could have locking subtleties that escape me. Perhaps
> there's better ways to solve this.

I have tried to study this part of your patch and it seems somewhat
non-trivial and risky part of proposal.

+ } else {
+ /*
+ * We found the actual first item on another block, so we have to perform
+ * a two-step search - first half, until our write-locked buffer, then another
+ * starting from our write-locked buffer.
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ LockBuffer(buf, BT_WRITE);
+
+ buf = _bt_moveright(rel, buf, natts, itup_scankey, &(itup->t_tid), false,
+ true, stack, BT_WRITE, NULL);
+
+ first_offset = _bt_binsrch(rel, buf, natts, itup_scankey, NULL, false, NULL);
+
+ xwait = _bt_check_unique(rel, itup, heapRel, nbuf, buf,
first_offset, itup_scankey,
+ checkUnique, &is_unique, &speculativeToken);
+
+ _bt_relbuf(rel, nbuf);
+ }

The idea for uniqueness check is that we hold the write lock on
buffer/page on which we are trying to operate (we do take read locks
on the consecutive pages during this check). Here, in your changed
proposal, you have two buffers (one to which the search led you and
one buffer previous in the chain) and before checking uniqueness, on
one of the buffers you seem to have write lock and on other you have
read lock. This seems to change the way uniqueness check works till
now, can you explain how this works (can we safely assume that all
searches for uniqueness check will lead to the same buffer first).

With this new mechanism, do we have two type of search interfaces
where one would work for keys (as now) and other for key-ctid or it
will be just a single interface which works both ways? I think either
way there are pros and cons.

Also, in the thread below you are talking about using the last bit in
t_info, I want to bring it to your notice that there is a patch of
mine [1] in which I have used it to avoid changing on-disk
compatibility of hash indexes. I am not saying that we should not
plan it for some other use, but just to keep in mind that there are
other proposals to use it. We can decide what is best way to proceed,
if we are aware of all the proposals that want to use it.

[1] - https://www.postgresql.org/message-id/CAA4eK1LkQ_Udism-Z2Dq6cUvjH3dB5FNFNnEzZBPsRjw0haFqA@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-08-20 04:31:24 TODO list updated for PG 10
Previous Message Amit Kapila 2016-08-20 02:50:35 Re: Should we cacheline align PGXACT?