Re: On-disk bitmap index patch

From: "Ayush Parashar" <aparashar(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Cc: mark(at)mark(dot)mielke(dot)cc, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "majordomo(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-26 06:51:45
Message-ID: C0EC5F11.3603%aparashar@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat. A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).

> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:

Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:
Column | Type | Cardinality
--------+-------------------+-----------
c1 | integer |
i10 | integer | 10
i50 | integer | 50
n10 | numeric | 10
n50 | numeric | 50
ctext | character varying |

Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):

-------------------------------------------------------------------------
Column B-tree size Bitmap size B-tree time Bitmap time
-------------------------------------------------------------------------
i10 35056 2392 11.8 6.0
i50 35056 4328 10.8 6.4
n10 52264 2656 34.8 9.6
n50 52616 4272 34.3 11.7
-------------------------------------------------------------------------

Query timings (in seconds):
-------------------------------------------------------------------------
Query B-tree Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where 4.08 1.61
i10=5 and i50=2 and n10=0.1 and n50=0.05;
-------------------------------------------------------------------------

This test case fits in memory, the benefits seen will be more if we take
bigger test case.

I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to ŒBitmap And¹ and
ŒBitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
ŒBitmap And¹ and ŒBitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.

Ayush

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-07-26 07:36:04 Re: Refactoring the API for amgetmulti
Previous Message craigp 2006-07-26 06:44:26 INSERT ... RETURNING in 8.2