From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Boom filters for hash joins (was: A design for amcheck heapam verification) |
Date: | 2017-09-19 13:28:00 |
Message-ID: | 9b9fd273-18e7-2b07-7aa1-4b00ab59b8d1@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 09/19/2017 02:55 AM, Robert Haas wrote:
> On Mon, Sep 18, 2017 at 5:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Uh, why does the planner need to be involved at all?
>>>
>>> Because it loses if the Bloom filter fails to filter anything. That's
>>> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x =
>>> b.x given a foreign key on a.x referencing b(x).
>>
>> Wouldn't a merge join be a lot more likely in this case anyway? Low
>> selectivity hash joins with multiple batches are inherently slow; the
>> wasted overhead of using a bloom filter may not matter.
>>
>> Obviously this is all pretty speculative. I suspect that this could be
>> true, and it seems worth investigating that framing of the problem
>> first.
>
> ISTR Tomas Vondra doing some experiments with this a few years ago and
> finding that it was, in fact, a problem.
>
You seem to have better memory than me, but you're right - I did some
experiments with this in 2015, the WIP patch and discussion is here:
https://www.postgresql.org/message-id/5670946E.8070705@2ndquadrant.com
The whole idea was that with a bloom filter we can reduce the amount of
tuples (from the outer relation) written to batches.
The patch is fairly simple, and did not try to push the bloom filters to
scan nodes or anything like that. It might be a meaningful first step,
though, particularly for selective joins (where only small number of
rows from the outer relation has a match in the hash table).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robins Tharakan | 2017-09-19 14:07:00 | Re: psql - add ability to test whether a variable exists |
Previous Message | Amit Kapila | 2017-09-19 13:26:29 | Re: Setting pd_lower in GIN metapage |