Re: State of the on-disk bitmap index

From: Daniel Bausch <bausch(at)dvs(dot)tu-darmstadt(dot)de>
To: Gianni Ciolli <gianni(dot)ciolli(at)2ndquadrant(dot)it>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org, Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>
Subject: Re: State of the on-disk bitmap index
Date: 2012-09-05 11:37:59
Message-ID: 50473997.2050706@dvs.tu-darmstadt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Gianni!

Thank you for your attention and response!

>> I used the (more recent) patches posted by Gianni Ciolli in 2008 and
>> currently am in the process of porting those to master HEAD -- like I
>> promised.
>
> Back in 2008 the PostgreSQL project wasn't using git, and I wasn't
> either; hence that patch is the best starting point I can find.

Ok, fine. However, while I do not find the mail at the moment, I think,
someone said, he fixed the VACUUM. Additionally, the Wiki lists Simon
and Jonah as the last authors, pretending they prepared a patch for 8.5.

>> IIRC, it was already shown that bitmap indexes can speed up TPC-H
>> queries. I will compare B+-tree, bitmap, and encoded bitmap indexes.
>
> I think what Albe meant (also what we attempted back then, if memory
> serves me, but without reaching completion) is a set of tests which
> show a significant benefit of your implementation over the existing
> index type implementations in PostgreSQL, to justify the increased
> complexity of the source code.
>
> The kind of test I have in mind is: a big table T with a
> low-cardinality column C, such that a btree index on C is
> significantly larger than the corresponding bitmap index on the same
> column.
>
> Create the bitmap index, and run a query like
>
> SELECT ... FROM T WHERE C = ...
>
> more than once; then you should notice that subsequent scans are much
> faster than the first run, because the index is small enough to fit
> the shared memory and will not need to be reloaded from disk at every
> scan.
>
> Then drop the bitmap index, and create a btree index on the same
> column; this time the index will be too large and subsequent scans
> will be slow, because the index blocks must be reloaded from disk at
> every scan.
>
> Hope that helps;

Is that, what your bmi-perf-test.tar.gz from 2008 does? I did not look
into that. I will at least do something like you just described plus
some TPC-H test. As the encoding helps against the cardinality
problems, I will also draw comparisons with different cardinalities.

Yours sincerely,
Daniel

--
Daniel Bausch
Wissenschaftlicher Mitarbeiter
Technische Universität Darmstadt
Fachbereich Informatik
Fachgebiet Datenbanken und Verteilte Systeme

Hochschulstraße 10
64289 Darmstadt
Germany

Tel.: +49 6151 16 6706
Fax: +49 6151 16 6229

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-09-05 12:56:57 Re: Proof of concept: standalone backend with full FE/BE protocol
Previous Message Kohei KaiGai 2012-09-05 11:30:37 Re: [bugfix] sepgsql didn't follow the latest core API changes