From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: bloom filter in Hash Joins with batches |
Date: | 2015-12-28 10:44:15 |
Message-ID: | 5681127F.9080301@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/28/2015 11:38 AM, David Rowley wrote:
> On 28 December 2015 at 23:23, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> On 12/28/2015 03:15 AM, David Rowley wrote:
>
> Maybe it would be better to, once the filter is built, simply
> count the
>
> number of 1 bits and only use the filter if there's less than
> <threshold> 1 bits compared to the size of the filter in bits.
> There's
> functionality in bms_num_members() to do this, and there's
> also __builtin_popcount() in newer version of GCC, which we
> could have
> some wrapper around, perhaps.
>
>
> I don't really see how this is relevant with the previous point. The
> number of 1s in the bitmap can tell you the false positive rate for
> the bloom filter, not what fraction of lookups will be answered with
> "true".
>
> So while this needs to be watched, so that we stop using the bloom
> filter when it gets too full (e.g. due to under-estimate), it's
> completely unrelated to the previous issue.
>
>
> Why is it not related? this has got me confused. If a bloom filter has
> all of the bits set to 1, then it will never filter any Tuples right?
Because the false positive rate can be computed without having to look
at the lookups. So it's inherently independent on the ordering of outer
relation, and so on.
> If so, then a filter with all 1 bits set should be thrown away, as
> it'll never help us, and the filter should generally become more
> worthwhile as it contains a higher ratio of 0 bits vs 1 bits. Of
> course we don't have a count of how many Tuples matched each bit, so
> this is based on the assumption that each bit matches an equal number
> of Tuples. Are you saying this is not an assumption that we should
> make?
Sure we should check that. All I'm saying is it has nothing to do with
the first problem described in the first part of the e-mail.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2015-12-28 10:52:29 | Re: WIP: bloom filter in Hash Joins with batches |
Previous Message | David Rowley | 2015-12-28 10:38:16 | Re: WIP: bloom filter in Hash Joins with batches |