| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)postgreSQL(dot)org | 
| Subject: | Max size of a btree index entry | 
| Date: | 2006-07-11 14:02:49 | 
| Message-ID: | 17329.1152626569@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Currently, we restrict btree index tuples to a size that ensures three of
them will fit on a page.  The motivation for this is the following two
considerations:
1. In a non-rightmost page, we need to include a "high key", or page
boundary key, that isn't one of the useful data keys.
2. In a non-leaf page, there had better be at least two child pages
(downlink entries), else we have failed to subdivide the page's key
range at all, and thus there would be a nonterminating recursion.
However: a non-leaf page actually has one more pointer than key,
eg a page with three children needs only two data keys:
---------------- entire key range assigned to page ------------------
-- range 1 --  boundary key -- range 2 --  boundary key -- range 3 --
     |                           |                           |
     v                           v                           v
child page 1               child page 2                 child page 3
We implement this by having the first data "tuple" on a non-leaf page
contain only a downlink TID and no key data, ie it's just the header.
So it appears to me that we could allow the maximum size of a btree
entry to be just less than half a page, rather than just less than
a third of a page --- the worst-case requirement for a non-leaf page
is not three real tuples, but one tuple header and two real tuples.
On a leaf page we might manage to fit only one real data item, but
AFAICS that doesn't pose any correctness problems.
Obviously a tree containing many such pages would be awfully inefficient
to search, but I think a more common case is that there are a few wide
entries in an index of mostly short entries, and so pushing the hard
limit up a little would add some flexibility with little performance
cost in real-world cases.
Have I missed something? Is this worth changing?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2006-07-11 14:07:43 | Re: pgsql-patches considered harmful | 
| Previous Message | Alvaro Herrera | 2006-07-11 13:59:51 | Re: poor performance with Context Switch Storm at TPC-W. |