Re: [HACKERS] Bizarre coding in _bt_binsrch

From: Vadim Mikheev <vadim(at)krs(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Bizarre coding in _bt_binsrch
Date: 1999-06-06 13:32:36
Message-ID: 375A7874.8D48D798@krs.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> The full-width case appears to assume that that can't happen: if we
> have a given key in an upper page, there can be *no* equal keys in
> subpages to its left. That's a rather strong assumption about how
> page splitting is done; is it correct?
>
> Even more to the point, *should* it be correct? If we always returned
> the last lesser key, then the code would work for any correctly
> sequenced b-tree, but this code looks like it works only if splits occur
> only at the leftmost of a group of equal keys. If there are a lot of
> equal keys, that could result in a badly unbalanced tree, no? Maybe
> that's the real reason why performance seems to be so bad for many
> equal keys... maybe the split algorithm needs to be relaxed?

Our btree-s use Lehman-Yao algorithm which works in assumption
that there is no duplicates at all. It's just reminding.
It was ~ 2 years ago when I changed duplicates handling
to fix some rare bugs (this is why you see BTP_CHAIN stuff
there) and now I don't remember many things and so I can't
comment. But after I changed btree-s I learned how Oracle
handles duplicates problem - it just uses heap tuple id
as (last) part of index key! So simple! Unfortunately,
I had not time to re-implement btree-s in this way.
But this would:

1. get rid of duplicates problem;
2. simplify code (BTP_CHAIN stuff would be removed);
3. order index tuples in such way that in scan heap pages
would be read sequentially (from up of file to down);
4. speed up finding index tuple which corresponds to heap one
(good for index cleaning up).

The main problem is just programistic: you will have to add
heap tid to the end of index tuples on internal index pages,
but on leaf pages heap tid is in the begin of index tuples
(inside of btitem struct).

So, if you're going to change btree, please consider ability
to implement above.

Vadim

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Broytmann 1999-06-06 14:08:22 INSERT into VIEW
Previous Message Vadim Mikheev 1999-06-06 12:41:52 Re: [HACKERS] Priorities for 6.6