Quick Links

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
Subject:	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date:	2020-02-19 16:14:03
Message-ID:	ed6caece-5a64-ad02-2134-1e71fb4b35a1@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 14.02.2020 05:57, Peter Geoghegan wrote:

> Attached is v33, which adds the last piece we need: opclass
> infrastructure that tells nbtree whether or not deduplication can be
> applied safely. This is based on work by Anastasia that was shared
> with me privately.
Thank you for this work. I've looked through the patches and they seem
to be ready for commit.
I haven't yet read recent documentation and readme changes, so maybe
I'll send some more feedback tomorrow.

Is there any specific reason, why we need separate btnameequalimage,
bpchar_equalimage and bttextequalimage functions?
As far as I see, they have the same implementation.
> Since using deduplication is supposed to pretty much be the norm from
> now on, it seemed like it might make sense to add a NOTICE about it
> during CREATE INDEX -- a notice letting the user know that it isn't
> being used due to a lack of opclass support:
>
> regression=# create table foo(bar numeric);
> CREATE TABLE
> regression=# create index on foo(bar);
> NOTICE: index "foo_bar_idx" cannot use deduplication
> CREATE INDEX
>
> Note that this NOTICE isn't seen with an INCLUDE index, since that's
> expected to not support deduplication.
>
> I have a feeling that not everybody will like this, which is why I'm
> pointing it out.
>
> Thoughts?

I would simply move it to debug level for all cases. Since from user's
perspective it doesn't differ that much from the case where
deduplication is applicable in general, but not very efficient due to
data distribution.
I also noticed that this is not consistent with ALTER index. For
example, alter index idx_n set (deduplicate_items =true); won't show any
message about deduplication.

I've tried several combinations with an index on a numeric column:

1) postgres=# create index idx_nd on tbl (n) with (deduplicate_items =
true);
NOTICE: index "idx_nd" cannot use deduplication
CREATE INDEX

Here the message seems appropriate. I don't think, we should restrict
creation of the index even when deduplicate_items parameter is set
explicitly, rather we may warn the user that it won't be efficient.

2) postgres=# create index idx_n on tbl (n) with (deduplicate_items =
false);
NOTICE: index "idx_n" cannot use deduplication
CREATE INDEX

In this case the message seems slightly strange to me.
Why should we show a notice about the fact that deduplication is not
possible if that is exactly what was requested?

3)
postgres=# create index idx on tbl (n);
NOTICE: index "idx" cannot use deduplication

In my opinion, this message is too specific for default behavior. It
exposes internal details without explanation and may look to user like
something went wrong.

In response to

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2020-02-14 02:57:47 from Peter Geoghegan

Responses

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2020-02-19 19:16:12 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Juan José Santamaría Flecha	2020-02-19 16:16:32	Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform
Previous Message	Masahiko Sawada	2020-02-19 15:44:27	Re: Internal key management system