From: | "Ayush Parashar" <aparashar(at)greenplum(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com> |
Cc: | mark(at)mark(dot)mielke(dot)cc, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "majordomo(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-26 06:51:45 |
Message-ID: | C0EC5F11.3603%aparashar@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat. A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).
> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:
Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:
Column | Type | Cardinality
--------+-------------------+-----------
c1 | integer |
i10 | integer | 10
i50 | integer | 50
n10 | numeric | 10
n50 | numeric | 50
ctext | character varying |
Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):
-------------------------------------------------------------------------
Column B-tree size Bitmap size B-tree time Bitmap time
-------------------------------------------------------------------------
i10 35056 2392 11.8 6.0
i50 35056 4328 10.8 6.4
n10 52264 2656 34.8 9.6
n50 52616 4272 34.3 11.7
-------------------------------------------------------------------------
Query timings (in seconds):
-------------------------------------------------------------------------
Query B-tree Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where 4.08 1.61
i10=5 and i50=2 and n10=0.1 and n50=0.05;
-------------------------------------------------------------------------
This test case fits in memory, the benefits seen will be more if we take
bigger test case.
I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to Bitmap And¹ and
Bitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
Bitmap And¹ and Bitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.
Ayush
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2006-07-26 07:36:04 | Re: Refactoring the API for amgetmulti |
Previous Message | craigp | 2006-07-26 06:44:26 | INSERT ... RETURNING in 8.2 |