From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PROPOSAL] Effective storage of duplicates in B-tree index. |
Date: | 2015-09-01 15:41:43 |
Message-ID: | 55E5C737.7040603@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/01/2015 11:31 AM, Alexander Korotkov wrote:
...
>
> Yes, In general GIN is a btree with effective duplicates handling +
> support of splitting single datums into multiple keys.
> This proposal is mostly porting duplicates handling from GIN to btree.
>
> Sure, there are differences - GIN indexes don't handle UNIQUE indexes,
>
>
> The difference between btree_gin and btree is not only UNIQUE feature.
> 1) There is no gingettuple in GIN. GIN supports only bitmap scans. And
> it's not feasible to add gingettuple to GIN. At least with same
> semantics as it is in btree.
> 2) GIN doesn't support multicolumn indexes in the way btree does.
> Multicolumn GIN is more like set of separate singlecolumn GINs: it
> doesn't have composite keys.
> 3) btree_gin can't effectively handle range searches. "a < x < b" would
> be hangle as "a < x" intersect "x < b". That is extremely inefficient.
> It is possible to fix. However, there is no clear proposal how to fit
> this case into GIN interface, yet.
>
> but the compression can only be effective when there are duplicate
> rows. So either the index is not UNIQUE (so the b-tree feature is
> not needed), or there are many updates.
>
> From my observations users can use btree_gin only in some cases. They
> like compression, but can't use btree_gin mostly because of #1.
Thanks for the explanation! I'm not that familiar with GIN internals,
but this mostly matches my understanding. I have only mentioned UNIQUE
because the lack of gettuple() method seems obvious - and it works fine
when GIN indexes are used as "bitmap indexes".
But you're right - we can't do index only scans on GIN indexes, which is
a huge benefit of btree indexes.
>
> Which brings me to the other benefit of btree indexes - they are
> designed for high concurrency. How much is this going to be affected
> by introducing the posting lists?
>
>
> I'd notice that current duplicates handling in PostgreSQL is hack over
> original btree. It is designed so in btree access method in PostgreSQL,
> not btree in general.
> Posting lists shouldn't change concurrency much. Currently, in btree you
> have to lock one page exclusively when you're inserting new value.
> When posting list is small and fits one page you have to do similar
> thing: exclusive lock of one page to insert new value.
> When you have posting tree, you have to do exclusive lock on one page of
> posting tree.
OK.
>
> One can say that concurrency would became worse because index would
> become smaller and number of pages would became smaller too. Since
> number of pages would be smaller, backends are more likely concur for
> the same page. But this argument can be user against any compression and
> for any bloat.
Which might be a problem for some use cases, but I assume we could add
an option disabling this per-index. Probably having it "off" by default,
and only enabling the compression explicitly.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-09-01 15:46:05 | Re: WIP: About CMake v2 |
Previous Message | David Fetter | 2015-09-01 15:37:17 | Re: perlcritic |