From: | "Jie Zhang" <jzhang(at)greenplum(dot)com> |
---|---|
To: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au> |
Cc: | pgsql-patches(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index implementation |
Date: | 2006-12-04 18:13:46 |
Message-ID: | C199A55A.C246%jzhang@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
On 12/4/06 8:22 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:
> Gavin Sherry wrote:
>> Hi all,
>>
>> Attached is a patch implementing bitmap indexes. It includes major
>> enhancements on the patch submitted during feature freeze for 8.2 here[1].
>>
>> In particular: much better integration with the existing bitmap scan code
>> with the internals of the bitmap streaming pushed down into the AM and
>> hidden from the executor code; completely new index creation algorithm
>> which reduced creation time by 20-75% depending on the data; modifications
>> to the encoding mechanism to suit the integration with bitmap index scans;
>> work on memory management; lots of code rewriting; range query support.
>> The code is also much cleaner now.
>
> Thanks! I'll take a look at it.
Thanks!
>
> We need to give the indexam API some further thought. As you know, I've
> been working on the Grouped Index Tuples stuff, which also requires
> changes to the API to get full benefit. There's a bunch of functionality
> I'd like to see:
>
> * Support for streamed bitmaps, like you have implemented.
Yes. It is also interesting to see how this type of stream bitmaps works the
one from the bitmap index.
>
> * Support for candidate matches. This is needed for GIT, as well as
> range-encoded bitmap indexes if/when we add them.
>
We have replace amgetmulti with amgetbitmap. This should be supported by
setting PagetableEntry accordingly.
> * Support for returning tuples in partial order. This is again needed
> for GIT, because grouped tuples don't keep track of the ordering within
> the group, so they need to be sorted if ordering necessary. And again
> it's also useful to return tuples in order from range-encoded bitmaps.
The bitmap index does not guarantee that returning tuples are ordered, but
it guarantees that the tuples from the same heap page will be returned
consecutively. I don't quite understand why this is useful for
range-encoding bitmaps in particular.
>
> * Support for kill_prior_tuple on bitmap scans.
>
> * A bulk insert API. When inserting a lot of tuples with similar keys,
> we could a considerable amount of CPU with a bulk insert API. A bulk
> insert to a B-tree for example would only need to descend the tree once,
> find the insert location, lock the target page just once and insert all
> the tuples that belong to that page. That would potentially also reduce
> WAL traffic.
>
Yes. Currently during the bitmap index creation, we maintain a buffer to
buffer many inserting tuples before writing them to bitmap pages, which
improves the creation performance by 30%-200% depending on the cardinalities
of the attribute to be indexed. This bulk insert API can take advantage of
this as well.
Thanks,
Jie
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-12-04 18:19:03 | Re: Bundle of patches |
Previous Message | Chris Browne | 2006-12-04 17:40:28 | Re: ECPG docs |