From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash Joins vs. Bloom Filters / take 2 |
Date: | 2018-02-21 01:10:07 |
Message-ID: | CAH2-WznzKA=+Cp8W0zoQ0EU+uH5ajGXM5qzGfqAuD1-s949Pbg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I suspect that it could make sense to use a Bloom filter to
>> summarize the entire inner side of the join all at once, even when
>> there are multiple batches. I also suspect that this is particularly
>> beneficial with parallel hash joins, where IPC/synchronization
>> overhead can be a big problem.
>>
>
> But that's what the patch does, currently - the filter is built during
> the initial pass through the data, and then used for all batches.
I misunderstood. I would probably do something like double or triple
the original rows estimate instead, though. The estimate must be at
least slightly inaccurate when we get to this point, but I don't think
that that's a good enough reason to give up on the estimate
completely.
> Actually, now that I think about it - I think the patch should throw
> away the filter away after the initial pass over the outer relation,
> because at that point we've used all the information in the filter.
Makes sense.
> I'm not sure it would make sense to then build smaller bloom filters for
> individual batches, but maybe it would?
I doubt it.
> Yeah, I admit those are rather crude rules.
You have to start somewhere.
> The trouble is that when we start with a single batch and then find out
> the estimate was wrong and we need to start batching, all bets are off.
> At that point it seems reasonable to just say "Here is X MBs of RAM, do
> what you can".
As I said above, I wouldn't say all bets are off when this happens --
not at all. Estimates are likely to often be somewhat wrong. If
they're completely wrong, we can probably swallow the cost of giving
up on a Bloom filter relatively late.
As I said, X should not be a portion of work_mem, because that has
only a weak relationship to what really matters.
>> You should try to exploit the fact that a Bloom filter can summarize a
>> large set reasonably well with a very compact, simple representation.
>> A false positive rate of 10% sounds a lot worse than 1% or 0.1%, but
>> for cases where Bloom probes will save a lot of work, it probably
>> doesn't make all that much difference -- the hash join is still much
>> faster.
> But the problem is that I don't know what is the total size of the hash
> table, because we're building the bloom filter for all the batches at
> once. And we don't know how many batches will be there - if we knew
> that, we could estimate the number of distinct values and we could use
> that to size the filter instead of doing this. (All of this only applies
> to the state where we start with a single batch and then find out we
> need to start batching, of course.)
I don't think that the number of batches should matter that much to
the costing/sizing/estimation logic, even if it's an interesting thing
to talk about when considering the upsides of a Bloom filter. My sense
is that we should focus on making sure that using a Bloom filter is
going to win, and not worry so much about whether that's going to be a
huge win or a modest win.
Suppose that you thought you were going to have a 10% false positive
rate with a 22.85MB Bloom filter for 40 million elements (my earlier
example). Further suppose that it turns out to be 80 million elements.
This means that you're going to get a false positive rate of 30%. This
could still be a big win, though, so it's not really a bad situation.
With 120 million elements, it's about 45%, but even then it could
still work out, especially because the hash join will be very slow
anyway. You also have to bear in mind that the 40 million estimate is
much more likely to be too high than too low, because you're assuming
distinct key values for the hash table.
You're taking a risk, in a sense, but a lot of things have to go wrong
for you to lose, and even then you can cut your losses before the
extra cost is too high.
Do you have a test query for this patch, that gives us some idea of the upside?
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2018-02-21 01:19:01 | Re: [HACKERS] path toward faster partition pruning |
Previous Message | Amit Langote | 2018-02-21 01:06:51 | Re: non-bulk inserts and tuple routing |