Quick Links

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
Subject:	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date:	2020-01-15 02:08:57
Message-ID:	CAH2-Wz=Tr6mxMsKRmv_=9-05_O9QWqOzQ8GweRV2DXS6+Y38QQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Still, v29 doesn't resolve the following points you've raised, where I
> haven't reached a final opinion on what to do myself. These items are
> as follows (I'm quoting your modified patch file sent on January 8th
> here):

Still no progress on these items, but I am now posting v30. A new
version seems warranted, because I now want to revive a patch from a
couple of years back as part of the deduplication project -- it would
be good to get feedback on that sooner rather than later. This is a
patch that you [Heikki] are already familiar with -- the patch to
speed up compactify_tuples() [1]. Sokolov Yura is CC'd here, since he
is the original author.

The deduplication patch is much faster with this in place. For
example, with v30:

pg(at)regression:5432 [25216]=# create unlogged table foo(bar int4);
CREATE TABLE
pg(at)regression:5432 [25216]=# create index unlogged_foo_idx on foo(bar);
CREATE INDEX
pg(at)regression:5432 [25216]=# insert into foo select g from
generate_series(1, 1000000) g, generate_series(1,10) i;
INSERT 0 10000000
Time: 17842.455 ms (00:17.842)

If I revert the "Bucket sort for compactify_tuples" commit locally,
then the same insert statement takes 31.614 seconds! In other words,
the insert statement is made ~77% faster by that commit alone. The
improvement is stable and reproducible.

Clearly there is a big compactify_tuples() bottleneck that comes from
PageIndexMultiDelete(). The hot spot is quite visible with "perf top
-e branch-misses".

The compactify_tuples() patch stalled because it wasn't clear if it
was worth the trouble at the time. It was originally written to
address a much smaller PageRepairFragmentation() bottleneck in heap
pruning. ISTM that deduplication alone is a good enough reason to
commit this patch. I haven't really changed anything about the
2017/2018 patch -- I need to do more review of that. We probably don't
need to qsort() inlining stuff (the bucket sort thing is the real
win), but I included it in v30 all the same.

Other changes in v30:

* We now avoid extra _bt_compare() calls within _bt_check_unique() --
no need to call _bt_compare() once per TID (once per equal tuple is
quite enough).

This is a noticeable performance win, even though the change was
originally intended to make the logic in _bt_check_unique() clearer.

* Reduced the limit on the size of a posting list tuple to 1/6 of a
page -- down from 1/3.

This seems like a good idea on the grounds that it keeps our options
open if we split a page full of duplicates due to UPDATEs rather than
INSERTs (i.e. we split a page full of duplicates that isn't also the
rightmost page among pages that store only those duplicates). A lower
limit is more conservative, and yet doesn't cost us that much space.

* Refined nbtsort.c/CREATE INDEX to work sensibly with non-standard
fillfactor settings.

This last item is a minor bugfix, really.

[1] https://commitfest.postgresql.org/14/1138/
--
Peter Geoghegan

Attachment	Content-Type	Size
v30-0005-DEBUG-Show-index-values-in-pageinspect.patch	application/octet-stream	4.4 KB
v30-0004-Teach-pageinspect-about-nbtree-posting-lists.patch	application/octet-stream	18.5 KB
v30-0003-Bucket-sort-for-compactify_tuples.patch	application/octet-stream	3.7 KB
v30-0001-Add-deduplication-to-nbtree.patch	application/octet-stream	216.2 KB
v30-0002-Header-for-customized-qsort.patch	application/octet-stream	21.1 KB

In response to

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2020-01-10 21:36:00 from Peter Geoghegan

Responses

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2020-01-29 01:29:05 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2020-01-15 02:28:05	Re: Improve errors when setting incorrect bounds for SSL protocols
Previous Message	Kyotaro Horiguchi	2020-01-15 02:02:24	Re: pause recovery if pitr target not reached