From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Boom filters for hash joins (was: A design for amcheck heapam verification) |
Date: | 2017-09-18 17:52:40 |
Message-ID: | CAH2-WzktE8=rzyDKyR7+kQ7sUpeoomH5op+3ga6JyCENGo9HQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 18, 2017 at 10:29 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Aug 29, 2017 at 10:22 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> (2) We could push a Bloom filter down to scans
>>> (many other databases do this, and at least one person has tried this
>>> with PostgreSQL and found it to pay off[1]).
>
>> I think the hard part is going to be figuring out a query planner
>> framework for this, because pushing down the Bloom filter down to the
>> scan changes the cost and the row-count of the scan.
>
> Uh, why does the planner need to be involved at all? This seems like
> strictly an execution-time optimization. Even if you wanted to try
> to account for it in costing, I think the reliability of the estimate
> would be nil, never mind any questions about whether the planner's
> structure makes it easy to apply such an adjustment.
That is how I think about it too, though I'm not at all confident of
being on the right track. (I'm pretty confident that the general idea
of using a Bloom filter for parallel hash join is a good idea,
though.)
I should point out that Bloom filters are quite insensitive to
misestimations in the total number of elements, an estimate of which
must be provided up-front (I suppose that a cardinality estimate might
make more sense for hash join bloom filters than an estimate of the
total number of elements in a batch). "Bloom Filters in Probabilistic
Verification" gives this as a key advantage for bloom filters over
other approaches to formal model verification [1]. If you can afford
to have significant misestimations and still not lose much benefit,
and if the additional overhead of having a Bloom filter is fixed and
reasonably low cost, then maybe it makes sense to apply bloom filters
as an opportunistic or optimistic optimization. Perhaps they can be
applied when there is little to lose but much to gain.
[1] http://www.ccs.neu.edu/home/pete/pub/bloom-filters-verification.pdf
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-09-18 17:58:29 | Re: src/test/subscription/t/005_encoding.pl is broken |
Previous Message | Tom Lane | 2017-09-18 17:29:32 | Re: Boom filters for hash joins (was: A design for amcheck heapam verification) |