From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |
Date: | 2019-03-11 00:17:20 |
Message-ID: | CAH2-WzkOzy914tP1RHAGJxXK8htxoLS0FXsR0HL9kPU3e5xTZA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 10, 2019 at 1:11 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > The idea with pg_upgrade'd v3 indexes is, as I said a while back, that
> > > they too have a heap TID attribute. nbtsearch.c code is not allowed to
> > > rely on its value, though, and must use
> > > minusinfkey/searching_for_pivot_tuple semantics (relying on its value
> > > being minus infinity is still relying on its value being something).
> >
> > Yeah. I find that's a complicated way to think about it. My mental model
> > is that v4 indexes store heap TIDs, and every tuple is unique thanks to
> > that. But on v3, we don't store heap TIDs, and duplicates are possible.
>
> I'll try it that way, then.
Attached is v16, which does it that way instead. There are simpler
comments, still located within _bt_compare(). These are based on your
suggested wording, with some changes. I think that I prefer it this
way too. Please let me know what you think.
Other changes:
* nbtsplitloc.c failed to consider the full range of values in the
split interval when deciding perfect penalty. It considered from the
middle to the left or right edge, rather than from the left edge to
the right edge. This didn't seem to really effect the quality of its
decisions very much, but it was still wrong. This is fixed by a new
function that determines the left and right edges of the split
interval -- _bt_interval_edges().
* We now record the smallest observed tuple during our pass over the
page to record split points. This is used by internal page splits, to
get a more useful "perfect penalty", saving cycles in the common case
where there isn't much variability in the size of tuples on the page
being split. The same field is used within the "split after new item"
optimization as a further crosscheck -- it's now impossible to fool it
into thinking that the page has equisized tuples.
The regression that I mentioned earlier isn't in pgbench type
workloads (even when the distribution is something more interesting
that the uniform distribution default). It is only in workloads with
lots of page splits and lots of index churn, where we get most of the
benefit of the patch, but also where the costs are most apparent.
Hopefully it can be fixed, but if not I'm inclined to think that it's
a price worth paying. This certainly still needs further analysis and
discussion, though. This revision of the patch does not attempt to
address that problem in any way.
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
v16-0007-DEBUG-Add-pageinspect-instrumentation.patch | application/octet-stream | 7.8 KB |
v16-0004-Allow-tuples-to-be-relocated-from-root-by-amchec.patch | application/octet-stream | 15.2 KB |
v16-0003-Consider-secondary-factors-during-nbtree-splits.patch | application/octet-stream | 51.9 KB |
v16-0002-Make-heap-TID-a-tie-breaker-nbtree-index-column.patch | application/octet-stream | 153.8 KB |
v16-0001-Refactor-nbtree-insertion-scankeys.patch | application/octet-stream | 57.9 KB |
v16-0005-Add-split-after-new-tuple-optimization.patch | application/octet-stream | 12.3 KB |
v16-0006-Add-high-key-continuescan-optimization.patch | application/octet-stream | 8.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-03-11 00:40:12 | Re: pgsql: Removed unused variable, openLogOff. |
Previous Message | Nikita Glukhov | 2019-03-10 23:49:07 | Add missing operator <->(box, point) |