From: | Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> |
---|---|
To: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: Hash Join-Filter Pruning using Bloom Filters |
Date: | 2008-11-02 23:28:49 |
Message-ID: | 29F3D1BF-141D-4EDA-B9FD-247CB84A5130@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I think the k hash functions are actually normally just different
slices of bits taken from one actual hash function anyways so it
sounds like you've done the right thing.
This sounds most interesting for multibatch hash joins if you could
build a bloom filter for the future batches to avoid having to spool
tuples that will never match.
greg
On 2 Nov 2008, at 09:49 PM, "Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com>
wrote:
> All,
>
> Attached is an initial patch I've been playing with which uses Bloom
> filters to reduce unnecessary processing of outer tuples in hash
> joins. In short, this works by creating a Bloom filter, adding all
> relevant tuples for the inner relation, and querying the filter (for
> existence) when retrieving tuples from the outer relation. This
> avoids unnecessary tuple movement and bucket searches for matches we
> already know can't exist. Currently it works only for JOIN_INNER, but
> could be modified to optimize anti/semi joins as well. Similarly, I
> created a GUC to enable pruning, named bloom_pruning.
>
> Rather than performing k hash functions, this implementation simply
> sets a bit based on the already-computed hash value. I wanted to send
> this around for reviews and comments before working on it further. As
> this isn't overly intrusive, if someone can commit to reviewing and
> providing input, I'll commit to having this ready for 8.4.
>
> --
> Jonah H. Harris, Senior DBA
> myYearbook.com
> <bloompruning_v1.patch>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
From | Date | Subject | |
---|---|---|---|
Next Message | Lawrence, Ramon | 2008-11-02 23:48:36 | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |
Previous Message | Mark Kirkwood | 2008-11-02 23:16:07 | Hot standby v5 patch assertion failure |