Re: CLUSTER and indisclustered

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, mark Kirkwood <markir(at)slithery(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER and indisclustered
Date: 2002-08-13 04:25:35
Message-ID: 200208130425.g7D4PZm26976@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I wanted to comment on this bitmapped index discussion because I am
hearing a lot about star joins, data warehousing, and bitmapped indexes
recently.

It seems we have several uses for bitmapped indexes:

Do index lookups in sequential heap order
Allow joining of bitmapped indexes to construct arbitrary indexes

There is a web page about "star joins" used a lot in data warehousing,
where you don't know what queries are going to be required and what
indexes to create:

http://www.dbdomain.com/a100397.htm

They show some sample queries, which is good. Here is some
interesting text:

Star Transformation

If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, and
DEPARTMENT_ID in the SALES table, then Oracle can resolve the query
using merges of the bitmap indexes.

Because Oracle can efficiently merge multiple bitmap indexes, you can
create a single bitmap index on each of the foreign-key columns in the
fact table rather than on every possible combination of columns. This
lets you support all possible combinations of dimensions without
creating an unreasonable number of indexes.

Added to TODO:

* Use bitmaps to fetch heap pages in sequential order [performance]
* Use bitmaps to combine existing indexes [performance]

and I will add some of these emails to TODO.detail/performance.

---------------------------------------------------------------------------

Tom Lane wrote:
> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > But after doing some benchmarking of various sorts of random reads
> > and writes, it occurred to me that there might be optimizations
> > that could help a lot with this sort of thing. What if, when we've
> > got an index block with a bunch of entries, instead of doing the
> > reads in the order of the entries, we do them in the order of the
> > blocks the entries point to?
>
> I thought to myself "didn't I just post something about that?"
> and then realized it was on a different mailing list. Here ya go
> (and no, this is not the first time around on this list either...)
>
>
> I am currently thinking that bitmap indexes per se are not all that
> interesting. What does interest me is bitmapped index lookup, which
> came back into mind after hearing Ann Harrison describe how FireBird/
> InterBase does it.
>
> The idea is that you don't scan the index and base table concurrently
> as we presently do it. Instead, you scan the index and make a list
> of the TIDs of the table tuples you need to visit. This list can
> be conveniently represented as a sparse bitmap. After you've finished
> looking at the index, you visit all the required table tuples *in
> physical order* using the bitmap. This eliminates multiple fetches
> of the same heap page, and can possibly let you get some win from
> sequential access.
>
> Once you have built this mechanism, you can then move on to using
> multiple indexes in interesting ways: you can do several indexscans
> in one query and then AND or OR their bitmaps before doing the heap
> scan. This would allow, for example, "WHERE a = foo and b = bar"
> to be handled by ANDing results from separate indexes on the a and b
> columns, rather than having to choose only one index to use as we do
> now.
>
> Some thoughts about implementation: FireBird's implementation seems
> to depend on an assumption about a fixed number of tuple pointers
> per page. We don't have that, but we could probably get away with
> just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
> Also, the main downside of this approach is that the bitmap could
> get large --- but you could have some logic that causes you to fall
> back to plain sequential scan if you get too many index hits. (It's
> interesting to think of this as lossy compression of the bitmap...
> which leads to the idea of only being fuzzy in limited areas of the
> bitmap, rather than losing all the information you have.)
>
> A possibly nasty issue is that lazy VACUUM has some assumptions in it
> about indexscans holding pins on index pages --- that's what prevents
> it from removing heap tuples that a concurrent indexscan is just about
> to visit. It might be that there is no problem: even if lazy VACUUM
> removes a heap tuple and someone else then installs a new tuple in that
> same TID slot, you should be okay because the new tuple is too new to
> pass your visibility test. But I'm not convinced this is safe.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2002-08-13 04:27:03 Re: CLUSTER and indisclustered
Previous Message Hannu Krosing 2002-08-13 04:19:56 Re: OOP real life example (was Re: Why is MySQL more