From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Neil Conway" <nconway(at)klamath(dot)dyndns(dot)org>, <mloftis(at)wgops(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: What is wrong with hashed index usage? |
Date: | 2002-06-21 20:20:18 |
Message-ID: | D90A5A6C612A39408103E6ECDD77B82906F47C@voyager.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
> Sent: Friday, June 21, 2002 9:52 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis(at)wgops(dot)com; Dann Corbit;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I remember three problems: build time, index size, and
> concurrency
> > > problems. I was wondering about the equal key case
> myself, and assumed
> > > hash may be a win there, but with the concurrency
> problems, is that even
> > > possible?
> >
> > Sure. Many-equal-keys are a problem for btree whether you have any
> > concurrency or not.
> >
> > > OK, I have reworded it. Is that better?
> >
> > It's better, but you've still discarded the original's
> explicit mention
> > of concurrency problems. Why do you want to remove information?
>
> OK, concurrency added. How is that?
>
> >
> > > How about an elog(NOTICE) for hash use?
> >
> > I don't think that's appropriate.
>
> I was thinking of this during CREATE INDEX ... hash:
>
> NOTICE: Hash index use is discouraged. See the CREATE INDEX
> reference page for more information.
>
> Does anyone else like/dislike that?
I think it might be OK temporarily, to show that there is some work that
needs done. When hashed indexes are fixed, the notice should be
removed.
I have not looked at the hash code. Here is a strategy (off the top of
my head) that seems that it should work:
Use Bob Jenkins' 64 bit generic hash from here (totally free for use and
fast as blazes):
http://burtleburtle.net/bob/hash/index.html
Specifically:
http://burtleburtle.net/bob/c/lookup8.c and routine: hash( k, length,
level)
Now, with a 64 bit hash, there is very tiny probability of a collision
(but you could have duplicate data).
The hash index would consist of nothing more than this:
[long long hash=64 bit hash code][unsigned nmatches=count of matching
hashes][array of {nmatches} pointers directly to the records with that
hash]
This is probably grotesqely oversimplified. But maybe it will spur an
indea in the person who writes the indexing code.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2002-06-21 20:31:17 | Re: What is wrong with hashed index usage? |
Previous Message | Larry Rosenman | 2002-06-21 20:13:55 | Re: What is wrong with hashed index usage? |