Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2020-02-20 18:58:29
Message-ID: CAH2-Wzk8aMGF6ficoKZigjwGw07hPtXdvZjPCrms_tT8GKzG=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2020 at 7:38 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> I don't think this patch really needs more nitpicking )

But when has that ever stopped it? :-)

> User can discover this with a complex query to pg_index and pg_opclass.
> To simplify this, we can probably wrap this into function or some field
> in pg_indexes.

A function isn't a real user interface, though -- it probably won't be noticed.

I think that there is a good chance that it just won't matter. The
number of indexes that won't be able to support deduplication will be
very small in practice. The important exceptions are INCLUDE indexes
and nondeterministic collations. These exceptions make sense
intuitively, and will be documented as limitations of those other
features.

The numeric/float thing doesn't really make intuitive sense, and
numeric is an important datatype. Still, numeric columns and float
columns seem to rarely get indexed. That just leaves container type
opclasses, like anyarray and jsonb.

Nobody cares about indexing container types with a B-Tree index, with
the possible exception of expression indexes on a jsonb column. I
don't see a way around that, but it doesn't seem all that important.
Again, applications are unlikely to have more than one or two of
those. The *overall* space saving will probably be almost as good as
if the limitation did not exist.

> Anyway, I would wait for feedback from pre-release testers.

Right -- let's delay making a final decision on it. Just like the
decision to enable it by default. It will work this way in the
committed version, but that isn't supposed to be the final word on it.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-02-20 19:15:48 Re: plan cache overhead on plpgsql expression
Previous Message Tom Lane 2020-02-20 18:40:47 Re: pgsql: Add kqueue(2) support to the WaitEventSet API.