Re: Bloom filter Pushdown Optimization for Merge Join

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Zheng Li <zhengli10(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bloom filter Pushdown Optimization for Merge Join
Date: 2022-10-01 07:45:11
Message-ID: CALNJ-vSrT3QBgCE_3cHVPOLpM8WnqTCDAvLPkT6rvQ4y7EjTvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

>
>
> On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>>
>>
>> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>>
>>> Hello,
>>>
>>> A bloom filter provides early filtering of rows that cannot be joined
>>> before they would reach the join operator, the optimization is also
>>> called a semi join filter (SJF) pushdown. Such a filter can be created
>>> when one child of the join operator must materialize its derived table
>>> before the other child is evaluated.
>>>
>>> For example, a bloom filter can be created using the the join keys for
>>> the build side/inner side of a hash join or the outer side of a merge
>>> join, the bloom filter can then be used to pre-filter rows on the
>>> other side of the join operator during the scan of the base relation.
>>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>>> discussion on using such optimization for hash join without going into
>>> the pushdown of the filter where its performance gain could be further
>>> increased.
>>>
>>> We worked on prototyping bloom filter pushdown for both hash join and
>>> merge join. Attached is a patch set for bloom filter pushdown for
>>> merge join. We also plan to send the patch for hash join once we have
>>> it rebased.
>>>
>>> Here is a summary of the patch set:
>>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>>> during the table scan instead of later on.
>>> -The bloom filter is pushed down along the execution tree to
>>> the target SeqScan nodes.
>>> -Experiments show that this optimization can speed up Merge
>>> Join by up to 36%.
>>>
>>> 2. The planner makes the decision to use the bloom filter based on the
>>> estimated filtering rate and the expected performance gain.
>>> -The planner accomplishes this by estimating four numbers per
>>> variable - the total number of rows of the relation, the number of
>>> distinct values for a given variable, and the minimum and maximum
>>> value of the variable (when applicable). Using these numbers, the
>>> planner estimates a filtering rate of a potential filter.
>>> -Because actually creating and implementing the filter adds
>>> more operations, there is a minimum threshold of filtering where the
>>> filter would actually be useful. Based on testing, we query to see if
>>> the estimated filtering rate is higher than 35%, and that informs our
>>> decision to use a filter or not.
>>>
>>> 3. If using a bloom filter, the planner also adjusts the expected cost
>>> of Merge Join based on expected performance gain.
>>>
>>> 4. Capability to build the bloom filter in parallel in case of
>>> parallel SeqScan. This is done efficiently by populating a local bloom
>>> filter for each parallel worker and then taking a bitwise OR over all
>>> the local bloom filters to form a shared bloom filter at the end of
>>> the parallel SeqScan.
>>>
>>> 5. The optimization is GUC controlled, with settings of
>>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>>
>>> We found in experiments that there is a significant improvement
>>> when using the bloom filter during Merge Join. One experiment involved
>>> joining two large tables while varying the theoretical filtering rate
>>> (TFR) between the two tables, the TFR is defined as the percentage
>>> that the two datasets are disjoint. Both tables in the merge join were
>>> the same size. We tested changing the TFR to see the change in
>>> filtering optimization.
>>>
>>> For example, let’s imagine t0 has 10 million rows, which contain the
>>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>>> for a join of these two tables is 40%, since 40% of the tables are
>>> disjoint from the other table (1 through 4 million for t0, 10 million
>>> through 14 million for t4).
>>>
>>> Here is the performance test result joining two tables:
>>> TFR: theoretical filtering rate
>>> EFR: estimated filtering rate
>>> AFR: actual filtering rate
>>> HJ: hash join
>>> MJ Default: default merge join
>>> MJ Filter: merge join with bloom filter optimization enabled
>>> MJ Filter Forced: merge join with bloom filter optimization forced
>>>
>>> TFR EFR AFR HJ MJ Default MJ Filter MJ Filter Forced
>>>
>>> -------------------------------------------------------------------------------------
>>> 10 33.46 7.41 6529 22638 21949 23160
>>> 20 37.27 14.85 6483 22290 21928 21930
>>> 30 41.32 22.25 6395 22374 20718 20794
>>> 40 45.67 29.7 6272 21969 19449 19410
>>> 50 50.41 37.1 6210 21412 18222 18224
>>> 60 55.64 44.51 6052 21108 17060 17018
>>> 70 61.59 51.98 5947 21020 15682 15737
>>> 80 68.64 59.36 5761 20812 14411 14437
>>> 90 77.83 66.86 5701 20585 13171 13200
>>> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
>>> Tables of 10M Rows.
>>>
>>> Attached you can find figures of the same performance test and a SQL
>>> script
>>> to reproduce the performance test.
>>>
>>> The first thing to notice is that Hash Join generally is the most
>>> efficient join strategy. This is because Hash Join is better at
>>> dealing with small tables, and our size of 10 million is still small
>>> enough where Hash Join outperforms the other join strategies. Future
>>> experiments can investigate using much larger tables.
>>>
>>> However, comparing just within the different Merge Join variants, we
>>> see that using the bloom filter greatly improves performance.
>>> Intuitively, all of these execution times follow linear paths.
>>> Comparing forced filtering versus default, we can see that the default
>>> Merge Join outperforms Merge Join with filtering at low filter rates,
>>> but after about 20% TFR, the Merge Join with filtering outperforms
>>> default Merge Join. This makes intuitive sense, as there are some
>>> fixed costs associated with building and checking with the bloom
>>> filter. In the worst case, at only 10% TFR, the bloom filter makes
>>> Merge Join less than 5% slower. However, in the best case, at 90% TFR,
>>> the bloom filter improves Merge Join by 36%.
>>>
>>> Based on the results of the above experiments, we came up with a
>>> linear equation for the performance ratio for using the filter
>>> pushdown from the actual filtering rate. Based on the numbers
>>> presented in the figure, this is the equation:
>>>
>>> T_filter / T_no_filter = 1 / (0.83 * estimated filtering rate + 0.863)
>>>
>>> For example, this means that with an estimated filtering rate of 0.4,
>>> the execution time of merge join is estimated to be improved by 16.3%.
>>> Note that the estimated filtering rate is used in the equation, not
>>> the theoretical filtering rate or the actual filtering rate because it
>>> is what we have during planning. In practice the estimated filtering
>>> rate isn’t usually accurate. In fact, the estimated filtering rate can
>>> differ from the theoretical filtering rate by as much as 17% in our
>>> experiments. One way to mitigate the power loss of bloom filter caused
>>> by inaccurate estimated filtering rate is to adaptively turn it off at
>>> execution time, this is yet to be implemented.
>>>
>>> Here is a list of tasks we plan to work on in order to improve this
>>> patch:
>>> 1. More regression testing to guarantee correctness.
>>> 2. More performance testing involving larger tables and complicated
>>> query plans.
>>> 3. Improve the cost model.
>>> 4. Explore runtime tuning such as making the bloom filter checking
>>> adaptive.
>>> 5. Currently, only the best single join key is used for building the
>>> Bloom filter. However, if there are several keys and we know that
>>> their distributions are somewhat disjoint, we could leverage this fact
>>> and use multiple keys for the bloom filter.
>>> 6. Currently, Bloom filter pushdown is only implemented for SeqScan
>>> nodes. However, it would be possible to allow push down to other types
>>> of scan nodes.
>>> 7. Explore if the Bloom filter could be pushed down through a foreign
>>> scan when the foreign server is capable of handling it – which could
>>> be made true for postgres_fdw.
>>> 8. Better explain command on the usage of bloom filters.
>>>
>>> This patch set is prepared by Marcus Ma, Lyu Pan and myself. Feedback
>>> is appreciated.
>>>
>>> With Regards,
>>> Zheng Li
>>> Amazon RDS/Aurora for PostgreSQL
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/flat/c902844d-837f-5f63-ced3-9f7fd222f175%402ndquadrant.com
>>
>>
>> Hi,
>> In the header of patch 1:
>>
>> In this prototype, the cost model is based on an assumption that there is
>> a linear relationship between the performance gain from using a semijoin
>> filter and the estimated filtering rate:
>> % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>> 0.137.
>>
>> How were the coefficients (0.83 and 0.137) determined ?
>> I guess they were based on the results of running certain workload.
>>
>> Cheers
>>
> Hi,
> For patch 1:
>
> +bool enable_mergejoin_semijoin_filter;
> +bool force_mergejoin_semijoin_filter;
>
> How would (enable_mergejoin_semijoin_filter = off,
> force_mergejoin_semijoin_filter = on) be interpreted ?
> Have you considered using one GUC which has three values: off, enabled,
> forced ?
>
> + mergeclauses_for_sjf = get_actual_clauses(path->path_mergeclauses);
> + mergeclauses_for_sjf =
> get_switched_clauses(path->path_mergeclauses,
> +
> path->jpath.outerjoinpath->parent->relids);
>
> mergeclauses_for_sjf is assigned twice and I don't
> see mergeclauses_for_sjf being reference in the call
> to get_switched_clauses().
> Is this intentional ?
>
> + /* want at least 1000 rows_filtered to avoid any nasty edge
> cases */
> + if (force_mergejoin_semijoin_filter || (filteringRate >= 0.35
> && rows_filtered > 1000))
>
> The above condition is narrower compared to the enclosing condition.
> Since there is no else block for the second if block, please merge the two
> if statements.
>
> + int best_filter_clause;
>
> Normally I would think `clause` is represented by List*. But
> best_filter_clause is an int. Please use another variable name so that
> there is less chance of confusion.
>
> For evaluate_semijoin_filtering_rate():
>
> + double best_sj_selectivity = 1.01;
>
> How was 1.01 determined ?
>
> + debug_sj1("SJPD: start evaluate_semijoin_filtering_rate");
>
> There are debug statements in the methods.
> It would be better to remove them in the next patch set.
>
> Cheers
>
Hi,
Still patch 1.

+ if (!outer_arg_md->is_or_maps_to_base_column
+ && !inner_arg_md->is_or_maps_to_constant)
+ {
+ debug_sj2("SJPD: outer equijoin arg does not map %s",
+ "to a base column nor a constant; semijoin is not
valid");

Looks like there is a typo: inner_arg_md->is_or_maps_to_constant should be
outer_arg_md->is_or_maps_to_constant

+ if (outer_arg_md->est_col_width > MAX_SEMIJOIN_SINGLE_KEY_WIDTH)
+ {
+ debug_sj2("SJPD: outer equijoin column's width %s",
+ "was excessive; condition rejected");

How is the value of MAX_SEMIJOIN_SINGLE_KEY_WIDTH determined ?

For verify_valid_pushdown():

+ Assert(path);
+ Assert(target_var_no > 0);
+
+ if (path == NULL)
+ {
+ return false;

I don't understand the first assertion. Does it mean path would always be
non-NULL ? Then the if statement should be dropped.

+ if (path->parent->relid == target_var_no)
+ {
+ /*
+ * Found source of target var! We know that the pushdown
+ * is valid now.
+ */
+ return true;
+ }
+ return false;

The above can be simplified as: return path->parent->relid == target_var_no;

+ * True if the given con_exprs, ref_exprs and operators will exactlty

Typo: exactlty -> exactly

+ if (!bms_equal(all_vars, matched_vars))
+ return false;
+ return true;

The above can be simplified as: return bms_equal(all_vars, matched_vars);

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-10-01 10:52:58 Re: [PATCH] Log details for client certificate failures
Previous Message Dilip Kumar 2022-10-01 06:14:01 Re: problems with making relfilenodes 56-bits