Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Date: 2005-05-11 01:51:49
Message-ID: 200505110151.j4B1pnt16211@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


Is there a TODO anywhere in this discussion? If so, please let me know.

---------------------------------------------------------------------------

Mischa Sandberg wrote:
> Quoting Mark Lewis <mark(dot)lewis(at)mir3(dot)com>:
>
> > If the original paper was published in 1984, then it's been more than
> > 20 years. Any potential patents would already have expired, no?
>
> Don't know, but the idea is pervasive among different vendors ...
> perhaps that's a clue.
>
> And having now read beyond the start of ExecHashJoin(), I can see that
> PG does indeed implement Grace hash; and the implementation is nice and
> clean.
>
> If there were room for improvement, (and I didn't see it in the source)
> it would be the logic to:
>
> - swap inner and outer inputs (batches) when the original inner turned
> out to be too large for memory, and the corresponding outer did not. If
> you implement that anyway (complicates the loops) then it's no trouble
> to just hash the smaller of the two, every time; saves some CPU.
>
> - recursively partition batches where both inner and outer input batch
> ends up being too large for memory, too; or where the required number of
> batch output buffers alone is too large for working RAM. This is only
> for REALLY big inputs.
>
> Note that you don't need a bad hash function to get skewed batch sizes;
> you only need a skew distribution of the values being hashed.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mischa Sandberg 2005-05-11 02:02:02 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Previous Message Bob Lee 2005-05-11 01:32:39 Age() calculation question

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-05-11 01:59:14 Re: Partitioning / Clustering
Previous Message Joshua D. Drake 2005-05-11 01:48:48 Re: Partitioning / Clustering