From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Locking B-tree leafs immediately in exclusive mode |
Date: | 2018-06-14 13:56:17 |
Message-ID: | CAGTBQpZ7AmS=33Av_QEqtMSS0Fz7=uSKjL+Fk7CGbCmh8LC4Sw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 14, 2018 at 9:44 AM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > > Our B-tree is currently maintaining duplicates unordered. So, during insertion
> > > we can traverse rightlinks in order to find page, which would fit new
> > > index tuple.
> > > However, in this case we're traversing pages in exclusive lock mode, and
> > > that happens already after re-lock. _bt_search() doesn't do anything with that.
> > > So, I assume there shouldn't be any degradation in the case of many
> > > duplicate entries.
> >
> > BTW, I have a prototype patch that makes the keys at the leaf level
> > unique. It is actually an implementation of nbtree suffix truncation
> > that sometimes *adds* a new attribute in pivot tuples, rather than
> > truncating away non-distinguishing attributes -- it adds a special
> > heap TID attribute when it must. The patch typically increases fan-in,
> > of course, but the real goal was to make all entries at the leaf level
> > truly unique, as L&Y intended (we cannot do it without suffix
> > truncation because that will kill fan-in). My prototype has full
> > amcheck coverage, which really helped me gain confidence in my
> > approach.
>
> Could you, please, clarify what do you mean by "fan-in"? As I
> understood, higher "fan-in" means more children on single non-leaf
> page, and in turn "better branching". Is my understanding correct?
>
> If my understanding is correct, then making leaf level truly unique
> without suffix truncation will kill fan-in, because additional heap
> TID attribute will increase pivot tuple size. However, that doesn't
> look like inevitable shortcoming, because we could store heap TID in
> t_tid of pivot index tuples. Yes, we've used t_tid offset for storing
> number of attributes in truncated tuples. But heap TID is going to be
> virtually the last attribute of tuple. So, pivot tuples containing
> heap TID are not going to be suffix-truncated anyway. So, for such
> tuples we can just don't set INDEX_ALT_TID_MASK, and they would be
> assumed to have all the attributes.
I had a patch[1] that did pretty much just that. I saw no contention
issues or adverse effects, but my testing of it was rather limited.
The patch has rotted significantly since then, sadly, and it's quite complex.
> So, my idea that it's not necessary to couple suffix truncation with
> making leaf tuples unique.
No, but the work involved for one and the other shares so much that it
lends itself to be done in the same patch.
> Regarding just making leaf tuples unique,
> I understand that it wouldn't be always win. When we have a lot of
> duplicates, current implementation searches among the pages containing
> those duplicates for the first page containing enough of free space.
> While with unique index, we would have to always insert into
> particular page. Thus, current implementation gives us better filling
> of pages, and (probably) faster insertion.
Not at all. Insertion cost in unique indexes with lots of duplicates
(happens, dead duplicates) grows quadratically on the number of
duplicates, and that's improved by making the index unique and sorted.
> Therefore, my idea is that there is a tradeoff between making btree
> index unique or non-unique. I think we will need an option for that.
We will need a flag somewhere to avoid having to require index
rebuilds during pg_upgrade. My old patch maintained backward
compatibility, but when using an older version index, the sort order
of tids could not be assumed, and thus many optimizations had to be
disabled.
But it is totally doable, and necessary unless you accept a very
painful pg_upgrade.
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2018-06-14 14:13:14 | Re: Locking B-tree leafs immediately in exclusive mode |
Previous Message | Robert Haas | 2018-06-14 13:50:33 | Re: Fix some error handling for read() and errno |