Re: What is wrong with hashed index usage?

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Neil Conway" <nconway(at)klamath(dot)dyndns(dot)org>, <mloftis(at)wgops(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 18:06:50
Message-ID: D90A5A6C612A39408103E6ECDD77B82906F47B@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
> Sent: Friday, June 21, 2002 6:32 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis(at)wgops(dot)com; Dann Corbit;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > <para>
> > > ! Because of the limited utility of hash indexes, a
> B-tree index
> > > ! should generally be preferred over a hash index.
> We do not have
> > > ! sufficient evidence that hash indexes are actually
> faster than
> > > ! B-trees even for <literal>=</literal> comparisons.
> Moreover,
> > > ! hash indexes require coarser locks; see <xref
> > > ! linkend="locking-indexes">.
> > > </para>
> > > </note>
> > > </para>
> > > --- 181,189 ----
> > > </synopsis>
> > > <note>
> > > <para>
> > > ! Testing has shown that hash indexes are slower
> than btree indexes,
> > > ! and the size and build time for hash indexes is
> much worse. For
> > > ! these reasons, hash index use is discouraged.
> >
> > This change strikes me as a step backwards. The existing
> wording tells
> > the truth; the proposed revision removes the facts in favor
> of a blanket
> > assertion that is demonstrably false.
>
> OK, which part of is "demonstrably false"? I think the old "should
> generally be preferred" is too vague. No one has come up with a case
> where hash has shown to be faster, and a lot of cases where
> it is slower.

I agree with Tom. Maybe it is not true for PostgreSQL that hashed
indexes are better, but for every other database if you are doing single
lookups and do not need to order the items sequentially, hashed indexes
are better. What this indicates to me is that hashed indexes could
{potentially} be much better implemented for PostgreSQL.

See section 2.4:
http://citeseer.nj.nec.com/cache/papers/cs/21214/http:zSzzSzwww.cs.cmu.e
duzSz~christoszSzcourseszSz826-resourceszSzFOILS-LATEXzSzslides.pdf/inde
xing-multimedia-databases.pdf

See
http://ycmi.med.yale.edu/nadkarni/db_course/NonStd_Contents.htm

See also:
http://www-courses.cs.uiuc.edu/~cs411/RR2_goodpoints.html

From the Oracle Rdb documentation:
1.3.5 Retrieval Methods
Oracle Rdb provides several methods for retrieving or accessing data. In
the physical design of your database, consider that Oracle Rdb can use
one or more of the following methods to retrieve the rows in a table:

Sequential: locating a row or rows in sequence by retrieving data within
a logical area
Sorted index lookup with value retrieval: using the database key (dbkey)
for the value from the index to retrieve the row
Sorted index only: using data values in the index key pertinent to your
query
Hashed index retrieval: for retrieving exact data value matches
Dbkey only: retrieving a row through its dbkey
You determine the retrieval method Oracle Rdb chooses by creating one or
more sorted or hashed indexes.

Sorted index retrieval provides indexed sequential access to rows in a
table. (A sorted index is also called a B-tree index.) By contrast,
hashed index retrieval, also known as hash-addressing, provides direct
retrieval of a specific row. Retrieval of a row is based on a given
value of some set of columns in the row (called the search key).

Use a hashed index primarily for random, direct retrieval when you can
supply the entire hashed key on which the hashed index is defined, such
as an employee identification number (ID). For this kind of retrieval,
input/output operations can be significantly reduced, particularly for
tables with many rows and large indexes.

For example, to retrieve a row using a sorted index that is four levels
deep, Oracle Rdb may need to do a total of five input/output operations,
one for each level of the sorted index and one to retrieve the actual
row. By using a hashed index, the number of input/output operations may
be reduced to one or two because hashed index retrieval retrieves the
row directly.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-06-21 19:06:22 Re: What is wrong with hashed index usage?
Previous Message Tom Lane 2002-06-21 17:19:49 Re: Adding encrypted identd authetification