Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Christensen <david(at)pgguru(dot)net>
Subject: Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
Date: 2024-08-13 18:39:10
Message-ID: CAH2-Wzm5M+Djb8dJnGdiP8dwN7fZDNR-PSwNekGvKxa9-a_9cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> Attached is version 16 now.

I ran this with my old land registry benchmark, used for validating
the space utilization impact of nbtree deduplication (among other
things). This isn't obviously the best benchmark for this sort of
thing, but I seem to recall that you'd used it yourself at some point.
To validate work in this area, likely including this patch. So I
decided to start there.

To be clear, this test involves bulk loading of an unlogged table (the
land registry table). The following composite index is created on the
table before we insert any rows, so most of the cycles here are in
index maintenance including _bt_search descents:

CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city
COLLATE "C", locality COLLATE "C");

I wasn't able to see much of an improvement with this patch applied.
It went from ~00:51.598 to ~00:51.053. That's a little disappointing,
given that this is supposed to be a sympathetic case for the patch.
Can you suggest something else? (Granted, I understand that this patch
has some complicated relationship with other patches of yours, which I
don't understand currently.)

I'm a bit worried about side-effects for this assertion:

@@ -485,7 +489,7 @@ _bt_check_unique(Relation rel, BTInsertState
insertstate, Relation heapRel,
Assert(insertstate->bounds_valid);
Assert(insertstate->low >= P_FIRSTDATAKEY(opaque));
Assert(insertstate->low <= insertstate->stricthigh);
- Assert(_bt_compare(rel, itup_key, page, offset) < 0);
+ Assert(_bt_compare(rel, itup_key, page, offset, &sprefix) < 0);
break;
}

More generally, it's not entirely clear how the code in
_bt_check_unique is supposed to work with the patch. Why should it be
safe to do what you're doing with the prefix there? It's not like
we're doing a binary search here -- it's more like a linear search.

> - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes,
> and use "prefix" as naming scheme, rather than cmpcol. A lack of
> prefix, previously indicated with a cmpcol value of 1, is now a prefix
> value of 0.

Found a likely-related bug in the changes you made to amcheck, which I
was able to fix myself like so:

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c7dc6725a..15be61777 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -3187,7 +3187,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
insertstate.buf = lbuf;

/* Get matching tuple on leaf page */
- offnum = _bt_binsrch_insert(state->rel, &insertstate, 1);
+ offnum = _bt_binsrch_insert(state->rel, &insertstate, 0);
/* Compare first >= matching item on leaf page, if any */

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-08-13 18:39:11 Re: collect_corrupt_items_vacuum.patch
Previous Message Etsuro Fujita 2024-08-13 18:35:33 Re: Cross-version Compatibility of postgres_fdw