Re: Bloom filter Pushdown Optimization for Merge Join

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Lyu Pan <lyu(dot)steve(dot)pan(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bloom filter Pushdown Optimization for Merge Join
Date: 2022-10-13 14:30:11
Message-ID: CALNJ-vS-6ckvKdBd0NP4aG01F-zRxVCuJfWOWEkWgxKni_DRUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

>
>
> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan <lyu(dot)steve(dot)pan(at)gmail(dot)com> wrote:
>
>> Hello Zhihong Yu & Tomas Vondra,
>>
>> Thank you so much for your review and feedback!
>>
>> We made some updates based on previous feedback and attached the new
>> patch set. Due to time constraints, we didn't get to resolve all the
>> comments, and we'll continue to improve this patch.
>>
>> > In this prototype, the cost model is based on an assumption that there
>> is
>> > a linear relationship between the performance gain from using a semijoin
>> > filter and the estimated filtering rate:
>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>> 0.137.
>> >
>> > How were the coefficients (0.83 and 0.137) determined ?
>> > I guess they were based on the results of running certain workload.
>>
>> Right, the coefficients (0.83 and 0.137) determined are based on some
>> preliminary testings. The current costing model is pretty naive and
>> we'll work on a more robust costing model in future work.
>>
>>
>> > I agree, in principle, although I think the current logic / formula is a
>> > bit too crude and fitted to the simple data used in the test. I think
>> > this needs to be formulated as a regular costing issue, considering
>> > stuff like cost of the hash functions, and so on.
>> >
>> > I think this needs to do two things:
>> >
>> > 1) estimate the cost of building the bloom filter - This shall depend on
>> > the number of rows in the inner relation, number/cost of the hash
>> > functions (which may be higher for some data types), etc.
>> >
>> > 2) estimate improvement for the probing branch - Essentially, we need to
>> > estimate how much we save by filtering some of the rows, but this also
>> > neeeds to include the cost of probing the bloom filter.
>> >
>> > This will probably require some improvements to the lib/bloomfilter, in
>> > order to estimate the false positive rate - this may matter a lot for
>> > large data sets and small work_mem values. The bloomfilter library
>> > simply reduces the size of the bloom filter, which increases the false
>> > positive rate. At some point it'll start reducing the benefit.
>> >
>>
>> These suggestions make a lot of sense. The current costing model is
>> definitely not good enough, and we plan to work on a more robust
>> costing model as we continue to improve the patch.
>>
>>
>> > OK. Could also build the bloom filter in shared memory?
>> >
>>
>> We thought about this approach but didn't prefer this one because if
>> all worker processes share the same bloom filter in shared memory, we
>> need to frequently lock and unlock the bloom filter to avoid race
>> conditions. So we decided to have each worker process create its own
>> bloom filter.
>>
>>
>> > IMHO we shouldn't make too many conclusions from these examples. Yes, it
>> > shows merge join can be improved, but for cases where a hashjoin works
>> > better so we wouldn't use merge join anyway.
>> >
>> > I think we should try constructing examples where either merge join wins
>> > already (and gets further improved by the bloom filter), or would lose
>> > to hash join and the bloom filter improves it enough to win.
>> >
>> > AFAICS that requires a join of two large tables - large enough that hash
>> > join would need to be batched, or pre-sorted inputs (which eliminates
>> > the explicit Sort, which is the main cost in most cases).
>> >
>> > The current patch only works with sequential scans, which eliminates the
>> > second (pre-sorted) option. So let's try the first one - can we invent
>> > an example with a join of two large tables where a merge join would win?
>> >
>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
>> >
>>
>> Agreed. The current examples are only intended to show us that using
>> bloom filters in merge join could improve the merge join performance
>> in some cases. We are working on testing more examples that merge join
>> with bloom filter could out-perform hash join, which should be more
>> persuasive.
>>
>>
>> > The bloom filter is built by the first seqscan (on t0), and then used by
>> > the second seqscan (on t1). But this only works because we always run
>> > the t0 scan to completion (because we're feeding it into Sort) before we
>> > start scanning t1.
>> >
>> > But when the scan on t1 switches to an index scan, it's over - we'd be
>> > building the filter without being able to probe it, and when we finish
>> > building it we no longer need it. So this seems pretty futile.
>> >
>> > It might still improve plans like
>> >
>> > -> Merge Join
>> > Merge Cond: (t0.c1 = t1.c1)
>> > SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
>> > SemiJoin Estimated Filtering Rate: 1.0000
>> > -> Sort
>> > Sort Key: t0.c1
>> > -> Seq Scan on t0
>> > -> Index Scan on t1
>> >
>> > But I don't know how common/likely that actually is. I'd expect to have
>> > an index on both sides, but perhaps I'm wrong.
>> >
>> > This is why hashjoin seems like a more natural fit for the bloom filter,
>> > BTW, because there we have a guarantee the inner relation is processed
>> > first (so we know the bloom filter is fine and can be probed).
>> >
>>
>> Great observation. The bloom filter only works if the first SeqScan
>> always runs to completion before the second SeqScan starts.
>> I guess one possible way to avoid futile bloom filter might be
>> enforcing that the bloom filter only be used if both the outer/inner
>> plans of the MergeJoin are Sort nodes, to guarantee the bloom filter
>> is ready to use after processing one side of the join, but this may be
>> too restrictive.
>>
>>
>> > I don't know what improvements you have in mind exactly, but I think
>> > it'd be good to show which node is building/using a bloom filter, and
>> > then also some basic stats (size, number of hash functions, FPR, number
>> > of probes, ...). This may require improvements to lib/bloomfilter, which
>> > currently does not expose some of the details.
>> >
>>
>> Along with the new patch set, we have added information to display
>> which node is building/using a bloom filter (as well as the
>> corresponding expressions), and some bloom filter basic stats. We'll
>> add more related information (e.g. FPR) as we modify lib/bloomfilter
>> implementation in future work.
>>
>>
>> Thanks again for the great reviews and they are really useful! More
>> feedback is always welcome and appreciated!
>>
>> Regards,
>> Lyu Pan
>> Amazon RDS/Aurora for PostgreSQL
>>
>> Hi,
> For v1-0001-Support-semijoin-filter-in-the-planner-optimizer.patch :
>
> +
> + /* want at least 1000 rows_filtered to avoid any nasty edge cases
> */
> + if (force_mergejoin_semijoin_filter ||
> + (filtering_rate >= 0.35 && rows_filtered > 1000 &&
> best_filter_clause >= 0))
>
> Currently rows_filtered is compared with a constant, should the constant
> be made configurable ?
>
> + * improvement of 19.5%. This equation also concludes thata a
> 17%
>
> Typo: thata
>
> + inner_md_array = palloc(sizeof(SemiJoinFilterExprMetadata) * num_md);
> + if (!outer_md_array || !inner_md_array)
> + {
> + return 0; /* a stack array allocation failed */
>
> Should the allocated array be freed before returning ?
>
> For verify_valid_pushdown(), the parameters in comment don't match the
> actual parameters. Please modify the comment.
>
> For is_fk_pk(), since the outer_key_list is fixed across the iterations, I
> think all_vars can be computed outside of expressions_match_foreign_key().
>
> Cheers
>

Continuing
with v1-0001-Support-semijoin-filter-in-the-planner-optimizer.patch

For get_switched_clauses(), the returned List contains all the clauses. Yet
the name suggests that only switched clauses are returned.
Please rename the method to adjust_XX or rearrange_YY so that it is less
likely to cause confusion.

For depth_of_semijoin_target(), idx is only used inside the `if (num_exprs
== get_appendrel_occluded_references(` block, you can move its declaration
inside that if block.

+ outside_subq_rte = root->simple_rte_array[outside_subq_var->varno];
+
+ /* System Vars have varattno < 0, don't bother */
+ if (outside_subq_var->varattno <= 0)
+ return 0;

Since the check on outside_subq_var->varattno doesn't depend on
outside_subq_rte, you can move the assignment to outside_subq_rte after the
if check.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-13 15:23:18 Re: PG upgrade 14->15 fails - database contains our own extension
Previous Message Robert Haas 2022-10-13 14:27:56 Re: PG upgrade 14->15 fails - database contains our own extension