Re: prefix btree implementation

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: prefix btree implementation
Date: 2005-10-07 14:04:15
Message-ID: 200510071404.j97E4Fu20249@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


OK, TODO updated:

* Consider compressing indexes by storing key values duplicated in
several rows as a single index entry

This is difficult because it requires datatype-specific knowledge.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Thu, 2005-10-06 at 22:43 -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > > > We do the prefix sharing when we build up index only, never on the fly.
> > >
> > > So are you saying that inserts of new data wouldn't make any use of
> > > this? ISTM that greatly reduces the usefulness, though I'm not objecting
> > > because compression during build is probably better than none at all. Is
> > > there a technical reason compression can't be used during normal
> > > operations?
> >
> > Added to TODO:
> >
> > * Consider compressing indexes by storing key prefix values shared by
> > several rows as a single index entry
>
> Just to re-iterate Tom's point. There isn't any easy way of doing this
> in an object-relational database where we can make almost no assumptions
> about particular datatypes. There is no definition of datatype prefix...
> The best you could do would be to introduce a new API that allows a
> datatype to provide a shorter prefix value when given a starting data
> value, but then you'd need to write a whole set of prefixing functions.
> But that is almost identical to the idea of functional indexes anyhow,
> so I see no value in providing a second way of doing this when the
> existing way can be more easily modified to do this.
>
> I do not think key prefixing should be on the TODO for PostgreSQL,
> unless we also agree that datatype independence should not be maintained
> in all cases and can be relaxed for built-in datatypes.
>
> I suggest we reword that to "Investigate techniques for compressing
> indexes that will work successfully with datatype independence".
>
> What we might consider is having the index store a chain of pointers as
> an array on the index row, rather than having each row map to one index
> row. That technique would considerably reduce index volume for non-
> unique indexes by removing lots of index tuple overhead. But you would
> need to keep at most N pointers on a row. This technique is used by
> Teradata's Non Unique Secondary Index design.
>
> Best Regards, Simon Riggs
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Emil Briggs 2005-10-07 14:10:54 Re: Some spinlock patch tests
Previous Message Tom Lane 2005-10-07 13:52:03 Re: Some spinlock patch tests