Quick Links

Re: Enabling B-Tree deduplication by default

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Enabling B-Tree deduplication by default
Date:	2020-01-29 18:41:39
Message-ID:	CA+Tgmob5WsrvtCHDxwOuTnFT5=-xCP89zFaXmBpesazvaF6KAw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 29, 2020 at 1:15 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The good news is that these extra cycles aren't very noticeable even
> with a workload where deduplication doesn't help at all (e.g. with
> several indexes an append-only table, and few or no duplicates). The
> cycles are generally a fixed cost. Furthermore, it seems to be
> possible to virtually avoid the problem in the case of unique indexes
> by applying the incoming-item-is-duplicate heuristic. Maybe I am
> worrying over nothing.

Yeah, maybe. I'm tempted to advocate for dropping the GUC and keeping
the reloption. If the worst case is a 3% regression and you expect
that to be rare, I don't think a GUC is really worth it, especially
given that the proposed semantics seem somewhat confusing. The
reloption can be used in a pinch to protect against either bugs or
performance regressions, whichever may occur, and it doesn't seem like
you need a second mechanism.

> Again, maybe I'm making an excessively thin distinction. I really want
> to be able to enable the feature everywhere, while also not getting
> even one complaint about it. Perhaps that's just not a realistic or
> useful goal.

One thing that you could do is try to learn whether deduplication (I
really don't like that name, but here we are) seems to be working for
a given index, perhaps even in a given session. For instance, suppose
you keep track of what happened the last ten times the current session
attempted deduplication within a given index. Store the state in the
relcache. If all of the last ten tries were failures, then only try
1/4 of the time thereafter. If you have a success, go back to trying
every time. That's pretty crude, but it would might be good enough to
blunt the downsides to the point where you can stop worrying.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Enabling B-Tree deduplication by default at 2020-01-29 18:15:38 from Peter Geoghegan

Responses

Re: Enabling B-Tree deduplication by default at 2020-01-29 19:50:26 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Wong	2020-01-29 19:16:52	Re: [HACKERS] kqueue
Previous Message	Tom Lane	2020-01-29 18:19:25	Re: Removing pg_pltemplate and creating "trustable" extensions