From: | Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL |
Date: | 2005-05-10 18:12:45 |
Message-ID: | 1115748765.4280f99d52dd5@webmail.telus.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
Quoting "Jim C. Nasby" <decibel(at)decibel(dot)org>:
> I'm not really familiar enough with hash indexes to know if this
> would
> work, but if the maximum bucket size was known you could use that to
> determine a maximum range of buckets to look at. In some cases, that
> range would include only one bucket, otherwise it would be a set of
> buckets. If you found a set of buckets, I think you could then just
> go
> to the specific one you need.
>
> If we assume that the maximum bucket size is one page it becomes
> more
> realistic to take an existing large bucket and split it into several
> smaller ones. This could be done on an update to the index page, or
> a
> background process could handle it.
>
> In any case, should this go on the TODO list?
>
> > Allowing a bucket size to be specified at CREATE INDEX doesn't seem
> out
> > of line though. We'd have to think up a scheme for
> index-AM-specific
> > index parameters ...
> --
> Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
> Give your computer some brain candy! www.distributed.net Team #1828
Google "dynamic hash" or "linear hash". It takes care of not needing to
have varying bucket sizes.
Hash indexes are useful if you ALWAYS require disk access; they behave
like worst-case random cache-thrash tests. That's probably why dbs have
gravitated toward tree indexes instead. On the other hand, there's more
(good) to hashing than initially meets the eye.
Dynamic multiway hashing has come a long way from just splicing the bits
together from multiple columns' hash values. If you can lay your hands
on Tim Merrett's old text "Relational Information Systems", it's an
eye-opener. Picture an efficient terabyte spreadsheet.
For one thing, unlike a btree, a multicolumn hash is symmetric: it
doesn't matter which column(s) you do not specify in a partial match.
For another, a multiway hash is useful for much lower selectivity than a
btree. I built such indexes for OLAP cubes, and some dimensions were
only 10 elements wide. At the point where btree indexing becomes worse
than seqscan, a multiway hash tells you which 10% of the disk to scan.
From | Date | Subject | |
---|---|---|---|
Next Message | Brendan Jurd | 2005-05-10 18:13:48 | Re: Shorthand for foreign key indices |
Previous Message | Tom Lane | 2005-05-10 18:07:00 | Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-05-10 18:13:11 | Re: Prefetch |
Previous Message | Tom Lane | 2005-05-10 18:07:00 | Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL |