Re: Bitmap index thoughts

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index thoughts
Date: 2006-12-27 10:49:12
Message-ID: 45924FA8.2010707@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gavin Sherry wrote:
> On Tue, 26 Dec 2006, Heikki Linnakangas wrote:
>> for typical bitmap index use cases and most of the needed pages should
>> stay in memory, but could we simplify this? Why do we need the auxiliary
>> heap, couldn't we just store the blk+offset of the LOV item directly in
>> the b-tree index item?
>
> The problem is, the b-tree code is very much tied to the heap. I don't
> want to modify the b-tree code to make bitmap indexes work (better).
> What's really tempting is to just manage our own balanced tree within the
> bitmap index file(s) itself. It would start from the metapage and simply
> spill to other 'special' index pages when necessary. The problem is, we do
> not have b-tree code generic enough that it would allow us to do this
> trivially -- consider concurrency and WAL in particular, which we
> currently get for free. I guess this is why I've been ignoring this issue
> :-).

Maybe we could reuse the code in ginbtree.c. Looks like Teodor & Oleg
had the same problem :).

Modifying the nbtree code doesn't seem that difficult either. AFAICS,
the only places where the heap is from within nbtree code is in index
building and uniqueness checks.

>> And instead of having separate LOV pages that store a number of LOV
>> items, how about storing each LOV item on a page of it's own, and using
>> the rest of the page to store the last chunk of the bitmap. That would
>> eliminate one page access, but more importantly, maybe we could then get
>> rid of all the bm_last_* attributes in BMLOVItemData that complicate the
>> patch quite a bit, while preserving the performance.
>
> That's an interesting approach. We would still need a concept of
> last_word, at the very least, and probably last_comp_word for convenience.

Why?

> PS: Another versio of the patch shall be forthcoming shortly. I've been
> working on compressing the data in memory during CREATE INDEX instead of
> just managing arrays of TIDs in memory as we did previously. The array of
> TIDs works great for well clustered data but it stinks for poorly
> clustered data as we approach maintenance_word_mem and have to swap a lot.

Ok, sounds good.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-12-27 10:58:55 Re: Bitmap index thoughts
Previous Message Dhanaraj M 2006-12-27 10:02:51 Allow the identifier length to be increased via a configure option