Re: Hash index todo list item

From: Jens-Wolfhard Schicke <j(dot)schicke(at)asco(dot)de>
To:
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 11:12:33
Message-ID: AF62AE70763EFC85F410F42A@[192.168.1.85]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hash
key, which means one more random page access. Is there any way to build
multi-_table_ indices? A join might then fetch all table rows with a given
unique key after one page fetch for the combined index.

- Hashes with trivial hash-functions (like identity) can also return rows
in a desired order.

- Is there a case where a sequentially scanning a hash-index is useful? I
can't find any, but maybe somebody else has a use-case.

- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns.
Assume a table like this:

CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insert
entries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate the
need to check in two indices and do a bitmap OR. OTOH it might not be
faster in any significant use cases because who'd need a link table with
nearly unique linked objects?

- In cases where the distribution of the hash-function is good, but a small
and relatively even number of rows exist for each key (like it might be the
case in the above example), it might be nice to reserve a given amount of
same-key row entries in each bucket, and hold a fill-count at the front of
it. That would avoid costly page fetches after each collision. You'd create
a hash-index with n-buckets, each m-elements large. When the bucket is
full, the usual collision handling continues.

- About hash enlargement: What about always using only the first k bits of
each hash value. When you find that the hash is "quite-full" (however that
is defined and detected), k is increased by one, effectively doubling the
hash size. New entries are then written as usual, while retrieving the old
entries needs to test at the k-bit-position first and if there is a miss,
also at the k-1-position and so forth. To limit this search, some
background process could after analyzing the index move old entries to the
now correct k-bit-position and increment some "min-k"-value once all old
entries have been moved. After the hash has been increased, the index would
approximately half it's speed for some time then. Additionally one could
also insert the entry at the new position if it has been found at the old
one only while using the index. A special "miss"-entry at the new position
doesn't help if nothing could be found because the old positions will
usually hold some data which resides there even if it uses k bits.

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-09-10 11:58:07 Re: Include Lists for Text Search
Previous Message Simon Riggs 2007-09-10 10:50:53 Re: A Silly Idea for Vertically-Oriented Databases