From: | Leonardo F <m_lists(at)yahoo(dot)it> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: bitmap indexes - performance |
Date: | 2010-07-01 16:30:39 |
Message-ID: | 896669.40438.qm@web29004.mail.ird.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> In
> principle a bitmap index scan should be significantly faster if the
> index can
> return the bitmap more or less "natively" rather than having
> to construct
> it.
The problem I'm seeing is that even on a 20M rows table, doing a
select * from t where c1=10 and c2=1
where c1 and c2 are low cardinality columns, leads to a *very*
fast bitmap index scan, even with btree indexes (200ms per index
on my PC).
The rest of the time is spent in actually retrieving heap rows; and
of course no index type is going to help with that.
Now: if an index search on such a big table takes so little time,
what kind of improvement are we trying to get?
The btree indexes on c1 and c2 are about 340MB eaxh: maybe
I'm experiencing some caching weirdness? Or it's normal that an
index search on such a big table is that fast (again, not counting
the heap scan step, which will be required no matter the index
type)? I'll try to re-test it...
> In particular, I recall some discussions about developing
> a "streaming
> API" whereby an index AM could return a bitmap page-by-page or
> so,
> rather than having to construct the whole thing in-memory
> before
> anything could happen. This would be a huge win for AND/OR
> cases,
> and even for a simple indexscan it would eliminate the existing
> startup
> cost penalty for a bitmap scan. Streaming like this would
> also
> eliminate the problem of having to lossify large bitmaps in order
> to
> not overrun memory.
One of the improvements I was going to try was to avoid calling
tid_set_bit (or whatever is the function, I don't remember now) for
every row, and call something like tid_set_bits_in_page where
a whole page was passed in: this would remove a lot of the hash_*
calls that are made in each and every tid_set_bit call (now that's
something btree can't do, but bitmap indexes can do "easily").
But I stopped before implementing it, because, as I said, I don't
think the improvement would still be worth it (even calling
tid_set_bit 1/20th of the needed times didn't help that much; we're
still talking about going from 200ms to 180ms on a query that
takes seconds to execute). But I'm going to give more "tested"
numbers...
Talking about bitmap indexes I don't think we should mention
memory... I mean: bitmap indexes are supposed to be used on
huge tables, and I don't think that 100MB (which holds a lot of
rows in a tbm...) to spare as work_mem would be a big problem...
As for the "startup cost": again, I wouldn't see that as a big
improvement, as we're talking mostly OLAP scenarios, where
most likely there will be some other "blocking" operator (group by,
sort, sub select etc) that will "remove" any improvements in
startup time...
To sum up: IMHO nor improvements in memory usage nor
in startup time would be good reasons to switch to bitmap
indexes... but bulk index creation time (10 minutes to index
what it takes 60 minutes with btree... and maybe more if tables
are bigger) and (maybe) index disk space might be...
but I'm not 100% convinced...
I'm trying to find more docs that explain the "improvements" of
bitmap indexes in other products... but most of what I've found
talks about bitmapAND/OR.... which is something that is very
cool, but that postgres already does even with btree indexes...
or index creation time/size, which are, for the moment, the only
things that I'm pretty confident the patch would actually provide.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-07-01 16:46:15 | Re: log files and permissions |
Previous Message | Mike Fowler | 2010-07-01 16:25:53 | Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function |