From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bitmap index thoughts |
Date: | 2006-12-27 10:49:12 |
Message-ID: | 45924FA8.2010707@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Gavin Sherry wrote:
> On Tue, 26 Dec 2006, Heikki Linnakangas wrote:
>> for typical bitmap index use cases and most of the needed pages should
>> stay in memory, but could we simplify this? Why do we need the auxiliary
>> heap, couldn't we just store the blk+offset of the LOV item directly in
>> the b-tree index item?
>
> The problem is, the b-tree code is very much tied to the heap. I don't
> want to modify the b-tree code to make bitmap indexes work (better).
> What's really tempting is to just manage our own balanced tree within the
> bitmap index file(s) itself. It would start from the metapage and simply
> spill to other 'special' index pages when necessary. The problem is, we do
> not have b-tree code generic enough that it would allow us to do this
> trivially -- consider concurrency and WAL in particular, which we
> currently get for free. I guess this is why I've been ignoring this issue
> :-).
Maybe we could reuse the code in ginbtree.c. Looks like Teodor & Oleg
had the same problem :).
Modifying the nbtree code doesn't seem that difficult either. AFAICS,
the only places where the heap is from within nbtree code is in index
building and uniqueness checks.
>> And instead of having separate LOV pages that store a number of LOV
>> items, how about storing each LOV item on a page of it's own, and using
>> the rest of the page to store the last chunk of the bitmap. That would
>> eliminate one page access, but more importantly, maybe we could then get
>> rid of all the bm_last_* attributes in BMLOVItemData that complicate the
>> patch quite a bit, while preserving the performance.
>
> That's an interesting approach. We would still need a concept of
> last_word, at the very least, and probably last_comp_word for convenience.
Why?
> PS: Another versio of the patch shall be forthcoming shortly. I've been
> working on compressing the data in memory during CREATE INDEX instead of
> just managing arrays of TIDs in memory as we did previously. The array of
> TIDs works great for well clustered data but it stinks for poorly
> clustered data as we approach maintenance_word_mem and have to swap a lot.
Ok, sounds good.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2006-12-27 10:58:55 | Re: Bitmap index thoughts |
Previous Message | Dhanaraj M | 2006-12-27 10:02:51 | Allow the identifier length to be increased via a configure option |