Re: Bloom filter Pushdown Optimization for Merge Join

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Zheng Li <zhengli10(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bloom filter Pushdown Optimization for Merge Join
Date: 2022-10-02 13:40:30
Message-ID: CALNJ-vRHUtztCGpjdvas9KyOndoOqwH1ZM8K9GoEGKJAHaW8fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 1, 2022 at 12:45 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

>
>
> On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>>
>>
>> On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>>>
>>>> Hello,
>>>>
>>>> A bloom filter provides early filtering of rows that cannot be joined
>>>> before they would reach the join operator, the optimization is also
>>>> called a semi join filter (SJF) pushdown. Such a filter can be created
>>>> when one child of the join operator must materialize its derived table
>>>> before the other child is evaluated.
>>>>
>>>> For example, a bloom filter can be created using the the join keys for
>>>> the build side/inner side of a hash join or the outer side of a merge
>>>> join, the bloom filter can then be used to pre-filter rows on the
>>>> other side of the join operator during the scan of the base relation.
>>>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>>>> discussion on using such optimization for hash join without going into
>>>> the pushdown of the filter where its performance gain could be further
>>>> increased.
>>>>
>>>> We worked on prototyping bloom filter pushdown for both hash join and
>>>> merge join. Attached is a patch set for bloom filter pushdown for
>>>> merge join. We also plan to send the patch for hash join once we have
>>>> it rebased.
>>>>
>>>> Here is a summary of the patch set:
>>>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>>>> during the table scan instead of later on.
>>>> -The bloom filter is pushed down along the execution tree to
>>>> the target SeqScan nodes.
>>>> -Experiments show that this optimization can speed up Merge
>>>> Join by up to 36%.
>>>>
>>>> 2. The planner makes the decision to use the bloom filter based on the
>>>> estimated filtering rate and the expected performance gain.
>>>> -The planner accomplishes this by estimating four numbers per
>>>> variable - the total number of rows of the relation, the number of
>>>> distinct values for a given variable, and the minimum and maximum
>>>> value of the variable (when applicable). Using these numbers, the
>>>> planner estimates a filtering rate of a potential filter.
>>>> -Because actually creating and implementing the filter adds
>>>> more operations, there is a minimum threshold of filtering where the
>>>> filter would actually be useful. Based on testing, we query to see if
>>>> the estimated filtering rate is higher than 35%, and that informs our
>>>> decision to use a filter or not.
>>>>
>>>> 3. If using a bloom filter, the planner also adjusts the expected cost
>>>> of Merge Join based on expected performance gain.
>>>>
>>>> 4. Capability to build the bloom filter in parallel in case of
>>>> parallel SeqScan. This is done efficiently by populating a local bloom
>>>> filter for each parallel worker and then taking a bitwise OR over all
>>>> the local bloom filters to form a shared bloom filter at the end of
>>>> the parallel SeqScan.
>>>>
>>>> 5. The optimization is GUC controlled, with settings of
>>>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>>>
>>>> We found in experiments that there is a significant improvement
>>>> when using the bloom filter during Merge Join. One experiment involved
>>>> joining two large tables while varying the theoretical filtering rate
>>>> (TFR) between the two tables, the TFR is defined as the percentage
>>>> that the two datasets are disjoint. Both tables in the merge join were
>>>> the same size. We tested changing the TFR to see the change in
>>>> filtering optimization.
>>>>
>>>> For example, let’s imagine t0 has 10 million rows, which contain the
>>>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>>>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>>>> for a join of these two tables is 40%, since 40% of the tables are
>>>> disjoint from the other table (1 through 4 million for t0, 10 million
>>>> through 14 million for t4).
>>>>
>>>> Here is the performance test result joining two tables:
>>>> TFR: theoretical filtering rate
>>>> EFR: estimated filtering rate
>>>> AFR: actual filtering rate
>>>> HJ: hash join
>>>> MJ Default: default merge join
>>>> MJ Filter: merge join with bloom filter optimization enabled
>>>> MJ Filter Forced: merge join with bloom filter optimization forced
>>>>
>>>> TFR EFR AFR HJ MJ Default MJ Filter MJ Filter Forced
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> 10 33.46 7.41 6529 22638 21949 23160
>>>> 20 37.27 14.85 6483 22290 21928 21930
>>>> 30 41.32 22.25 6395 22374 20718 20794
>>>> 40 45.67 29.7 6272 21969 19449 19410
>>>> 50 50.41 37.1 6210 21412 18222 18224
>>>> 60 55.64 44.51 6052 21108 17060 17018
>>>> 70 61.59 51.98 5947 21020 15682 15737
>>>> 80 68.64 59.36 5761 20812 14411 14437
>>>> 90 77.83 66.86 5701 20585 13171 13200
>>>> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
>>>> Tables of 10M Rows.
>>>>
>>>> Attached you can find figures of the same performance test and a SQL
>>>> script
>>>> to reproduce the performance test.
>>>>
>>>> The first thing to notice is that Hash Join generally is the most
>>>> efficient join strategy. This is because Hash Join is better at
>>>> dealing with small tables, and our size of 10 million is still small
>>>> enough where Hash Join outperforms the other join strategies. Future
>>>> experiments can investigate using much larger tables.
>>>>
>>>> However, comparing just within the different Merge Join variants, we
>>>> see that using the bloom filter greatly improves performance.
>>>> Intuitively, all of these execution times follow linear paths.
>>>> Comparing forced filtering versus default, we can see that the default
>>>> Merge Join outperforms Merge Join with filtering at low filter rates,
>>>> but after about 20% TFR, the Merge Join with filtering outperforms
>>>> default Merge Join. This makes intuitive sense, as there are some
>>>> fixed costs associated with building and checking with the bloom
>>>> filter. In the worst case, at only 10% TFR, the bloom filter makes
>>>> Merge Join less than 5% slower. However, in the best case, at 90% TFR,
>>>> the bloom filter improves Merge Join by 36%.
>>>>
>>>> Based on the results of the above experiments, we came up with a
>>>> linear equation for the performance ratio for using the filter
>>>> pushdown from the actual filtering rate. Based on the numbers
>>>> presented in the figure, this is the equation:
>>>>
>>>> T_filter / T_no_filter = 1 / (0.83 * estimated filtering rate + 0.863)
>>>>
>>>> For example, this means that with an estimated filtering rate of 0.4,
>>>> the execution time of merge join is estimated to be improved by 16.3%.
>>>> Note that the estimated filtering rate is used in the equation, not
>>>> the theoretical filtering rate or the actual filtering rate because it
>>>> is what we have during planning. In practice the estimated filtering
>>>> rate isn’t usually accurate. In fact, the estimated filtering rate can
>>>> differ from the theoretical filtering rate by as much as 17% in our
>>>> experiments. One way to mitigate the power loss of bloom filter caused
>>>> by inaccurate estimated filtering rate is to adaptively turn it off at
>>>> execution time, this is yet to be implemented.
>>>>
>>>> Here is a list of tasks we plan to work on in order to improve this
>>>> patch:
>>>> 1. More regression testing to guarantee correctness.
>>>> 2. More performance testing involving larger tables and complicated
>>>> query plans.
>>>> 3. Improve the cost model.
>>>> 4. Explore runtime tuning such as making the bloom filter checking
>>>> adaptive.
>>>> 5. Currently, only the best single join key is used for building the
>>>> Bloom filter. However, if there are several keys and we know that
>>>> their distributions are somewhat disjoint, we could leverage this fact
>>>> and use multiple keys for the bloom filter.
>>>> 6. Currently, Bloom filter pushdown is only implemented for SeqScan
>>>> nodes. However, it would be possible to allow push down to other types
>>>> of scan nodes.
>>>> 7. Explore if the Bloom filter could be pushed down through a foreign
>>>> scan when the foreign server is capable of handling it – which could
>>>> be made true for postgres_fdw.
>>>> 8. Better explain command on the usage of bloom filters.
>>>>
>>>> This patch set is prepared by Marcus Ma, Lyu Pan and myself. Feedback
>>>> is appreciated.
>>>>
>>>> With Regards,
>>>> Zheng Li
>>>> Amazon RDS/Aurora for PostgreSQL
>>>>
>>>> [1]
>>>> https://www.postgresql.org/message-id/flat/c902844d-837f-5f63-ced3-9f7fd222f175%402ndquadrant.com
>>>
>>>
>>> Hi,
>>> In the header of patch 1:
>>>
>>> In this prototype, the cost model is based on an assumption that there
>>> is a linear relationship between the performance gain from using a semijoin
>>> filter and the estimated filtering rate:
>>> % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>>> 0.137.
>>>
>>> How were the coefficients (0.83 and 0.137) determined ?
>>> I guess they were based on the results of running certain workload.
>>>
>>> Cheers
>>>
>> Hi,
>> For patch 1:
>>
>> +bool enable_mergejoin_semijoin_filter;
>> +bool force_mergejoin_semijoin_filter;
>>
>> How would (enable_mergejoin_semijoin_filter = off,
>> force_mergejoin_semijoin_filter = on) be interpreted ?
>> Have you considered using one GUC which has three values: off, enabled,
>> forced ?
>>
>> + mergeclauses_for_sjf =
>> get_actual_clauses(path->path_mergeclauses);
>> + mergeclauses_for_sjf =
>> get_switched_clauses(path->path_mergeclauses,
>> +
>> path->jpath.outerjoinpath->parent->relids);
>>
>> mergeclauses_for_sjf is assigned twice and I don't
>> see mergeclauses_for_sjf being reference in the call
>> to get_switched_clauses().
>> Is this intentional ?
>>
>> + /* want at least 1000 rows_filtered to avoid any nasty edge
>> cases */
>> + if (force_mergejoin_semijoin_filter || (filteringRate >= 0.35
>> && rows_filtered > 1000))
>>
>> The above condition is narrower compared to the enclosing condition.
>> Since there is no else block for the second if block, please merge the
>> two if statements.
>>
>> + int best_filter_clause;
>>
>> Normally I would think `clause` is represented by List*. But
>> best_filter_clause is an int. Please use another variable name so that
>> there is less chance of confusion.
>>
>> For evaluate_semijoin_filtering_rate():
>>
>> + double best_sj_selectivity = 1.01;
>>
>> How was 1.01 determined ?
>>
>> + debug_sj1("SJPD: start evaluate_semijoin_filtering_rate");
>>
>> There are debug statements in the methods.
>> It would be better to remove them in the next patch set.
>>
>> Cheers
>>
> Hi,
> Still patch 1.
>
> + if (!outer_arg_md->is_or_maps_to_base_column
> + && !inner_arg_md->is_or_maps_to_constant)
> + {
> + debug_sj2("SJPD: outer equijoin arg does not map %s",
> + "to a base column nor a constant; semijoin is not
> valid");
>
> Looks like there is a typo: inner_arg_md->is_or_maps_to_constant should
> be outer_arg_md->is_or_maps_to_constant
>
> + if (outer_arg_md->est_col_width > MAX_SEMIJOIN_SINGLE_KEY_WIDTH)
> + {
> + debug_sj2("SJPD: outer equijoin column's width %s",
> + "was excessive; condition rejected");
>
> How is the value of MAX_SEMIJOIN_SINGLE_KEY_WIDTH determined ?
>
> For verify_valid_pushdown():
>
> + Assert(path);
> + Assert(target_var_no > 0);
> +
> + if (path == NULL)
> + {
> + return false;
>
> I don't understand the first assertion. Does it mean path would always be
> non-NULL ? Then the if statement should be dropped.
>
> + if (path->parent->relid == target_var_no)
> + {
> + /*
> + * Found source of target var! We know that the
> pushdown
> + * is valid now.
> + */
> + return true;
> + }
> + return false;
>
> The above can be simplified as: return path->parent->relid ==
> target_var_no;
>
> + * True if the given con_exprs, ref_exprs and operators will exactlty
>
> Typo: exactlty -> exactly
>
> + if (!bms_equal(all_vars, matched_vars))
> + return false;
> + return true;
>
> The above can be simplified as: return bms_equal(all_vars, matched_vars);
>
> Cheers
>
Hi,
Still in patch 1 :-)

+ if (best_path->use_semijoinfilter)
+ {
+ if (best_path->best_mergeclause != -1)

Since there is no else block, the two conditions can be combined.

+ ListCell *clause_cell = list_nth_cell(mergeclauses,
best_path->best_mergeclause);

As shown in the above code, best_mergeclause is the position of the best
merge clause in mergeclauses.
I think best_mergeclause_pos (or similar name) is more appropriate for the
fieldname.

For depth_of_semijoin_target():

+ * Parameters:
+ * node: plan node to be considered for semijoin push down.

The name of the parameter is pn - please align the comment with code.

For T_SubqueryScan case in depth_of_semijoin_target():

+ Assert(rte->subquery->targetList);
...
+ if (rel && rel->subroot
+ && rte && rte->subquery && rte->subquery->targetList)

It seems the condition can be simplified since rte->subquery->targetList
has passed the assertion.

For is_table_scan_node_source_of_relids_or_var(), the else block can be
simplified to returning scan_node_varno == target_var->varno directly.

For get_appendrel_occluded_references():

+ * Given a virtual column from an Union ALL subquery,
+ * return the expression it immediately occludes that satisfy

Since the index is returned from the func, it would be better to clarify
the comment by saying `return the last index of expression ...`

+ /* Subquery without append and partitioned tables */

append and partitioned tables -> append or partitioned tables

More reviews for subsequent patches to follow.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-10-02 14:42:47 Re: Bloom filter Pushdown Optimization for Merge Join
Previous Message Robert Haas 2022-10-02 10:43:45 Re: disfavoring unparameterized nested loops