From: | mark(at)mark(dot)mielke(dot)cc |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com> |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-25 01:58:05 |
Message-ID: | 20060725015805.GA30087@mark.mielke.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote:
> Jie Zhang wrote:
> > > IIRC they quoted the cardinality of 10000 as something that is still
> > > faster than btree for several usecases.
> > >
> > > And also for AND-s of several indexes, where indexes are BIG, your btree
> > > indexes may be almost as big as tables but the resulting set of pages is
> > > small.
> > Yeah, Hannu points it out very well -- the bitmap index works very well
> > when columns have low cardinalities and AND operations will produce small
> > number of results.
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.
Not necessarily. Cardinality of 2 means 1/2. 3 means 1/3. 4 means 1/4. Etc.
Reading 1/4, for a larger table, has a good chance of being faster than
reading 4/4 of the table. :-)
No opinion on whether such tables exist in the real world - but the
concept itself seems sound.
> > Also, the bitmap index is very small in low cardinality cases, where the
> > btree tends to take up at least 10 times more space.
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?
Without looking at the code, but having read the Oracle docs, I would
guess yes. Every vacuum/delete would need to clear all of the bits for
the row. Every insert/update would need to allocate a bit in at least
the bitmap tree for the row inserted/updated. Seems like more pages
may need to be written. Although perhaps some clever optimization
would limit this. :-)
It seems interesting though. We won't really know until we see the
benchmarks. I'm seeing it as a form of working directly with the
intermediate form of the bitmap index scanner. If the existing index
scanner, building the bitmaps on the fly can out-perform the regular
index scanner, I'm seeing potentially in a pre-built bitmap.
Obviously, it isn't the answer to everything. The use I see for it,
are a few of my 1:N object attribute tables. The table has an object
identifier, and a column indicating the attribute type that the value
is for. If I have 20 or more attribute type values, however, most
objects include rows for most attribute types, my regular index ends
up repeating the attribute type for every row. If I want to scan the
table for all rows that have a particular attribute type with a
particular value, it's a seqscan right now. With the bitmap scanner,
knowing which rows to skip to immediately is readily available.
Will it be worth it or not? I won't know until I try it. :-)
Cheers,
mark
--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2006-07-25 01:59:41 | Re: LDAP patch & feature freeze |
Previous Message | Gavin Sherry | 2006-07-25 01:51:05 | Re: On-disk bitmap index patch |