From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: bloom filter in Hash Joins with batches |
Date: | 2015-12-17 16:00:47 |
Message-ID: | 5672DC2F.20205@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/17/2015 11:44 AM, Simon Riggs wrote:
>
> My understanding is that the bloom filter would be ineffective in any of
> these cases
> * Hash table is too small
Yes, although it depends what you mean by "too small".
Essentially if we can do with a single batch, then it's cheaper to do a
single lookup in the hash table instead of multiple lookups in the bloom
filter. The bloom filter might still win if it fits into L3 cache, but
that seems rather unlikely.
> * Bloom filter too large
Too large with respect to what?
One obvious problem is that the bloom filter is built for all batches at
once, i.e. for all tuples, so it may be so big won't fit into work_mem
(or takes a significant part of it). Currently it's not accounted for,
but that'll need to change.
> * Bloom selectivity > 50% - perhaps that can be applied dynamically,
> so stop using it if it becomes ineffective
Yes. I think doing some preliminary selectivity estimation should not be
difficult - that's pretty much what calc_joinrel_size_estimate() already
does.
Doing that at dynamically is also possible, but quite tricky. Imagine
for example the outer relation is sorted - in that case we may get long
sequences of the same value (hash), and all of them will either have a
match in the inner relation, or not have a match. That may easily skew
the counters used for disabling the bloom filter dynamically.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2015-12-17 16:03:42 | Re: Patch: fix lock contention for HASHHDR.mutex |
Previous Message | Dean Rasheed | 2015-12-17 15:38:06 | Re: Inaccurate results from numeric ln(), log(), exp() and pow() |