From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Stating the significance of Lehman & Yao in the nbtree README |
Date: | 2014-07-22 04:55:09 |
Message-ID: | CAA4eK1K1GMocZY=TQCdkf0ucDMTdqBFqT5OafidM6rch9gQCcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 27, 2014 at 1:52 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>
> While talking to various people during pgCon, I was reminded that the
> nbtree README does a poor job of explaining the actual practical
> advantages of L&Y from a high level. The "move right" trick is
> initially mentioned only as an adjunct to a discussion of the
> special-case handling of root page splits, even though it's of huge
> significance. We also say this within nbtinsert.c:
>
> /*
> * If the page was split between the time that we surrendered our read
> * lock and acquired our write lock, then this page may no longer be the
> * right place for the key we want to insert. In this case, we need to
> * move right in the tree. See Lehman and Yao for an excruciatingly
> * precise description.
> */
>
> (Why we need to say something about _bt_moveright() within
> _bt_doinsert() that is equally true of both _bt_moveright() callers
> isn't clear).
There is a mention about the race condition where it needs to move right
in another caller (_bt_search) of _bt_moveright() as well.
/*
* Race -- the page we just grabbed may have split since we read its
* pointer in the parent (or metapage). If it has, we may need to
* move right to its new sibling. Do that.
..
Do you think there is more to what is already mentioned on top of second
caller which we should add or you think if it is true for both, then it
should
be on top of _bt_moveright()?
> I think that this isn't enough. Attached patch proposes to add a small
> paragraph at the top of the nbtree README, to clarify the advantages
> of L&Y from a high level. I don't think it's appropriate to state the
> advantages of an algorithm in a README file generally, but in this
> instance it makes it easier to understand the algorithm.
In general, I agree with you that we should mention about any advantage
of the algorithm we are using and especially if it is significant. I think
it
will be better if can also mention how that advantage or use is realized
in our implementation as we are already doing in README of nbtree.
+
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling). The addition of right-links at all levels, as well as the
+ addition of a page "high key" allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent). L&Y Trees are sometimes referred to as "B-Link
+ trees" in the literature.
+
The above indicates 2 things:
a. L & Y doesn't need to hold read locks concurrently.
b. Advantage of right-links at all levels and "high-key".
As per my understanding, we are not following point (a) in our code,
so what is the benefit we get by having a reference of same in README?
Isn't it better if we mention how the point (b) is used in our code and
it's advantage together rather than keeping it at top of README?
Already README mentions in brief about right-link and how it is used
as below:
".. The scan must remember the page's right-link at the time it was scanned,
since that is the page to move right to; if we move right to the current
right-link then we'd re-scan any items moved by a page split. ..."
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2014-07-22 04:58:57 | Re: small doccumentation fix in psql |
Previous Message | Tom Lane | 2014-07-22 04:40:22 | Re: IS NOT DISTINCT FROM + Indexing |