From: | bob_jenkins(at)burtleburtle(dot)net |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Hash indexes (was: On-disk bitmap index patch) |
Date: | 2006-08-01 21:26:18 |
Message-ID: | 1154467577.962440.53040@i42g2000cwa.googlegroups.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Kenneth Marshall wrote:
> On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote:
> > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> > > Jim Nasby wrote:
> > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > > > >Hannu Krosing <hannu(at)skype(dot)net> writes:
> > >
> > > > >>What would be the use-case for hash indexes ? And what should be
> > > > >>done to make them faster than btree ?
> > > > >
> > > > >If we knew, we'd do it ;-) But no one's put enough effort into it
> > > > >to find out.
> > > >
> > > > Do they use the same hash algorithm as hash joins/aggregation? If so,
> > > > wouldn't hash indexes be faster for those operations than regular
> > > > indexes?
> > >
> > > The main problem doesn't seem to be in the hash algorithm (which I
> > > understand to mean the hashing function), but in the protocol for
> > > concurrent access of index pages, and the distribution of keys in pages
> > > of a single hash key.
> > >
> > > This is described in a README file or a code comment somewhere in the
> > > hash AM code. Someone needs to do some profiling to find out what the
> > > bottleneck really is, and ideally find a way to fix it.
> >
> > What I'm getting at is that I've never seen any explanation for the
> > theoretical use cases where a hash index would outperform a btree. If we
> > knew what kind of problems hash indexes were supposed to solve, we could
> > try and interest people who are solving those kinds of problems in
> > fixing hash indexes.
>
> The big win for hash indexes is the idea that searching for a single
> value should only take 1 I/O operation in a perfect world. Btree can
> not do that.
Hash indexes stored on disk still need a level of indirection -- you've
got to look up what range of blocks contains your hash value. How big
your table of ranges is depends on how big the hash index is and how
big your ranges are. Almost always you can fit that table into a block
cached in memory. But, the root of a BTree is often cached in memory
too. So there's no advantage for a hash index over a BTree index until
the BTree needs to grow to three levels deep, what is that, usually
10,000 or 100,000 records. Beyond that, you're right, the BTree slowly
grows deeper while the hash index doesn't.
From | Date | Subject | |
---|---|---|---|
Next Message | Rod Taylor | 2006-08-01 21:27:31 | Re: GENERATED ... AS IDENTITY, Was: Re: Feature Freeze |
Previous Message | Hannu Krosing | 2006-08-01 21:22:32 | Re: Hash indexes |