From: | Atri Sharma <atri(dot)jiit(at)gmail(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Bloom Filter lookup for hash joins |
Date: | 2013-06-26 06:46:03 |
Message-ID: | CAOeZVif-R-iLF966wEipk5By-KhzVLOqpWqurpaK3p5fYw-Rdw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi All,
I have been researching bloom filters and discussed it on IRC with
RhodiumToad and David Fetter, and they pointed me to the various
places that could potentially have bloom filters, apart from the
places that already have them currently.
I have been reading the current implementation of hash joins, and in
ExecScanHashBucket, which I understand is the actual lookup function,
we could potentially look at a bloom filter per bucket. Instead of
actually looking up each hash value for the outer relation, we could
just check the corresponding bloom filter for that bucket, and if we
get a positive, then lookup the actual values i.e. continue with our
current behaviour (since we could be looking at a false positive).
This doesn't involve a lot of new logic. We need to add a bloom filter
in HashJoinState and set it when the hash table is being made. Also,
one thing we need to look at is the set of hash functions being used
for the bloom filters. This can be a point of further discussion.
The major potential benefit we could gain is that bloom filters never
give false negatives. So, we could save a lot of lookup where the
corresponding bloom filter gives a negative.
This approach can also be particularly useful for hash anti joins,
where we look for negatives. Since bloom filters can easily be stored
and processed, by straight bit operations, we could be looking at a
lot of saving here.
I am not sure if we already do something like this. Please direct me
to the relevant parts in the code if we already do.
Regards,
Atri
--
Regards,
Atri
l'apprenant
From | Date | Subject | |
---|---|---|---|
Next Message | Robins Tharakan | 2013-06-26 06:55:53 | Re: Add more regression tests for dbcommands |
Previous Message | Hari Babu | 2013-06-26 05:36:00 | Re: fixing pg_ctl with relative paths |