From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com> |
Subject: | Re: WIP: Covering + unique indexes. |
Date: | 2018-03-26 10:10:30 |
Message-ID: | CAPpHfduZrxM81gQ5jRqL5+EMUAmfb8GXDgOz9tKGrFttFw4OPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 25, 2018 at 1:47 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I was going to say that you could just treat the low bit in the t_tid
> offset as representing "see catalog entry". My first idea was that
> nothing would have to change about the existing format, since internal
> page items already have only the low bit set within their offset.
> However, I now see that that won't really work, because we don't
> change the offset in high keys when they're copied from a real item
> during a page split. Whatever we do, it has to work equally well for
> all "separator keys" -- that is, it must work for both downlinks in
> internal pages, and all high keys (including high keys at the leaf
> level).
>
OK.
> A good solution is to use the unused 13th t_bit. If hash can have a
> INDEX_MOVED_BY_SPLIT_MASK, then nbtree can have a INDEX_ALT_TID_MASK.
> This avoids a BTREE_VERSION bump, and allows us to deal with the
> highkey offset issue. Actually, it's even more flexible than that --
> it can work with ordinary leaf tuples in the future, too. That is, we
> can eventually implement prefix truncation and deduplication at the
> leaf level using this representation, since there is nothing that
> limits INDEX_ALT_TID_MASK IndexTuples to "separator keys".
>
> The main difference between this approach to leaf prefix
> truncation/compression/deduplication, and the GIN entry tree's posting
> list representation would be that it wouldn't have to be
> super-optimized for duplicates, at the expense of more common case for
> regular nbtree indexes -- having few or no duplicates. A btree_gin
> index on pgbench_accounts(aid) looks very similar to an equivalent
> nbtree index if you just compare internal pages from each, but they
> look quite different at the leaf level, where GIN has 24 byte
> IndexTuples instead of 16 bytes IndexTuples. Of course, this is
> because the leaf pages have posting lists that can never be simple
> heap pointer TIDs.
>
Right, btree_gin is much smaller than regular btree when there are a lot
of duplicates. When there is no duplicates then btree_gin becomes larger
than regular btree, because gin stores single item pointer less compact
than btree.
A secondary goal of this INDEX_ALT_TID_MASK representation should be
> that it won't even be necessary to know that an IndexTuple is
> contained within a leaf page rather than an index page (again, unlike
> GIN). I'm pretty confident that we can have a truly universal
> IndexTuple representation for nbtree, while supporting all of these
> standard optimizations.
>
> Sorry for going off in a tangent, but I think it's somewhat necessary
> to have a strategy here. Of course, we don't have to get everything
> right now, but we should be looking in this direction whenever we talk
> about on-disk nbtree changes.
>
So, as I get you're proposing to introduce INDEX_ALT_TID_MASK flag
which would indicate that we're storing something special in the t_tid
offset. And that should help us not only for covering indexes, but also for
further btree enhancements including suffix truncation. What exactly do
you propose to store into t_tid offset when INDEX_ALT_TID_MASK flag
is set? Is it number of attributes in this particular index tuple?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2018-03-26 10:31:01 | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Previous Message | Simon Riggs | 2018-03-26 09:53:42 | Re: [HACKERS] MERGE SQL Statement for PG11 |