From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |
Date: | 2019-03-16 16:55:14 |
Message-ID: | CAH2-Wzko7dc9+7nzfKOJxYy00PXS3NsQc5XDt+dSeRgAYLDBkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 16, 2019 at 9:17 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Hmm. "re-find", maybe? We use that term in a few error messages already,
> to mean something similar.
WFM.
> At first, I thought this would be horrendously expensive, but thinking
> about it a bit more, nearby tuples will always follow the same path
> through the upper nodes, so it'll all be cached. So maybe it's not quite
> so bad.
That's deliberate, though you could call bt_relocate_from_root() from
anywhere else if you wanted to. It's a bit like a big nested loop
join, where the inner side has locality.
> Then, a cosmic ray changes the 20 on the root page to 18. That causes
> the the leaf tuple 19 to become non-re-findable; if you descend the for
> 19, you'll incorrectly land on page 6. But it also causes the high key
> on page 2 to be different from the downlink on the root page. Wouldn't
> the existing checks catch this?
No, the existing checks will not check that. I suppose something
closer to the existing approach *could* detect this issue, by making
sure that the "seam of identical high keys" from the root to the leaf
are a match, but we don't use the high key outside of its own page.
Besides, there is something useful about having the code actually rely
on _bt_search().
There are other similar cases, where it's not obvious how you can do
verification without either 1) crossing multiple levels, or 2)
retaining a "low key" as well as a high key (this is what Goetz Graefe
calls "retaining fence keys to solve the cousin verification problem"
in Modern B-Tree Techniques). What if the corruption was in the leaf
page 6 from your example, which had a 20 at the start? We wouldn't
have compared the downlink from the parent to the child, because leaf
page 6 is the leftmost child, and so we only have "-inf". The lower
bound actually comes from the root page, because we truncate "-inf"
attributes during page splits, even though we don't have to. Most of
the time they're not "absolute minus infinity" -- they're "minus
infinity in this subtree".
Maybe you could actually do something with the high key from leaf page
5 to detect the stray value "20" in leaf page 6, but again, we don't
do anything like that right now.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2019-03-16 17:32:14 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |
Previous Message | Tom Lane | 2019-03-16 16:20:01 | Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view? |