Re: CLUSTER and indisclustered

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, mark Kirkwood <markir(at)slithery(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER and indisclustered
Date: 2002-08-07 15:13:54
Message-ID: 1028733234.13418.113.camel@taru.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2002-08-07 at 15:26, Tom Lane wrote:
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > Now I remembered my original preference for page bitmaps (vs. tuple
> > bitmaps): one can't actually make good use of a bitmap of tuples because
> > there is no fixed tuples/page ratio and thus no way to quickly go from
> > bit position to actual tuple. You mention the same problem but propose a
> > different solution.
>
> > Using page bitmap, we will at least avoid fetching any unneeded pages -
> > essentially we will have a sequential scan over possibly interesting
> > pages.
>
> Right. One form of the "lossy compression" idea I suggested is to
> switch from a per-tuple bitmap to a per-page bitmap once the bitmap gets
> too large to work with.

If it is a real bitmap, should it not be easyeast to allocate at the
start ?

a page bitmap for a 100 000 000 tuple table with 10 tuples/page will be
sized 10000000/8 = 1.25 MB, which does not look too big for me for that
amount of data (the data table itself would occupy 80 GB).

Even having the bitmap of 16 bits/page (with the bits 0-14 meaning
tuples 0-14 and bit 15 meaning "seq scan the rest of page") would
consume just 20 MB of _local_ memory, and would be quite justifyiable
for a query on a table that large.

For a real bitmap index the tuples-per-page should be a user-supplied
tuning parameter.

> Again, one could imagine doing that only in denser areas of the bitmap.

I would hardly call the resulting structure "a bitmap" ;)

And I'm not sure the overhead for a more complex structure would win us
any additional performance for most cases.

> > But I guess that CLUSTER support for INSERT will not be touched for 7.3
> > as will real bitmap indexes ;)
>
> All of this is far-future work I think.

After we do that we will probably be able claim support for
"datawarehousing" ;)

> Adding a new scan type to the
> executor would probably be pretty localized, but the ramifications in
> the planner could be extensive --- especially if you want to do plans
> involving ANDed or ORed bitmaps.

Also going to "smart inserter" which can do local clustering on sets of
real bitmap indexes for INSERTS (and INSERT side of UPDATE) would
probably be a major change from our current "stupid inserter" ;)

This will not be needed for bitmap resolution higher than 1bit/page but
default local clustering on bitmap indexes will probably buy us some
extra performance. by avoiding data page fetches when such indexes are
used.

AN anyway the support for INSERT being aware of clustering will probably
come up sometime.

------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2002-08-07 15:20:19 Re: Open 7.3 items
Previous Message Richard Tucker 2002-08-07 15:12:00 Re: PITR, checkpoint, and local relations