From: | Zhihong Yu <zyu(at)yugabyte(dot)com> |
---|---|
To: | Zheng Li <zhengli10(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Bloom filter Pushdown Optimization for Merge Join |
Date: | 2022-10-01 04:20:54 |
Message-ID: | CALNJ-vQN+qwhAnJuJO0E-PD-ww86M9vLFWcC+mCjPx9CEJGsWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>
> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
>> Hello,
>>
>> A bloom filter provides early filtering of rows that cannot be joined
>> before they would reach the join operator, the optimization is also
>> called a semi join filter (SJF) pushdown. Such a filter can be created
>> when one child of the join operator must materialize its derived table
>> before the other child is evaluated.
>>
>> For example, a bloom filter can be created using the the join keys for
>> the build side/inner side of a hash join or the outer side of a merge
>> join, the bloom filter can then be used to pre-filter rows on the
>> other side of the join operator during the scan of the base relation.
>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>> discussion on using such optimization for hash join without going into
>> the pushdown of the filter where its performance gain could be further
>> increased.
>>
>> We worked on prototyping bloom filter pushdown for both hash join and
>> merge join. Attached is a patch set for bloom filter pushdown for
>> merge join. We also plan to send the patch for hash join once we have
>> it rebased.
>>
>> Here is a summary of the patch set:
>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>> during the table scan instead of later on.
>> -The bloom filter is pushed down along the execution tree to
>> the target SeqScan nodes.
>> -Experiments show that this optimization can speed up Merge
>> Join by up to 36%.
>>
>> 2. The planner makes the decision to use the bloom filter based on the
>> estimated filtering rate and the expected performance gain.
>> -The planner accomplishes this by estimating four numbers per
>> variable - the total number of rows of the relation, the number of
>> distinct values for a given variable, and the minimum and maximum
>> value of the variable (when applicable). Using these numbers, the
>> planner estimates a filtering rate of a potential filter.
>> -Because actually creating and implementing the filter adds
>> more operations, there is a minimum threshold of filtering where the
>> filter would actually be useful. Based on testing, we query to see if
>> the estimated filtering rate is higher than 35%, and that informs our
>> decision to use a filter or not.
>>
>> 3. If using a bloom filter, the planner also adjusts the expected cost
>> of Merge Join based on expected performance gain.
>>
>> 4. Capability to build the bloom filter in parallel in case of
>> parallel SeqScan. This is done efficiently by populating a local bloom
>> filter for each parallel worker and then taking a bitwise OR over all
>> the local bloom filters to form a shared bloom filter at the end of
>> the parallel SeqScan.
>>
>> 5. The optimization is GUC controlled, with settings of
>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>
>> We found in experiments that there is a significant improvement
>> when using the bloom filter during Merge Join. One experiment involved
>> joining two large tables while varying the theoretical filtering rate
>> (TFR) between the two tables, the TFR is defined as the percentage
>> that the two datasets are disjoint. Both tables in the merge join were
>> the same size. We tested changing the TFR to see the change in
>> filtering optimization.
>>
>> For example, let’s imagine t0 has 10 million rows, which contain the
>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>> for a join of these two tables is 40%, since 40% of the tables are
>> disjoint from the other table (1 through 4 million for t0, 10 million
>> through 14 million for t4).
>>
>> Here is the performance test result joining two tables:
>> TFR: theoretical filtering rate
>> EFR: estimated filtering rate
>> AFR: actual filtering rate
>> HJ: hash join
>> MJ Default: default merge join
>> MJ Filter: merge join with bloom filter optimization enabled
>> MJ Filter Forced: merge join with bloom filter optimization forced
>>
>> TFR EFR AFR HJ MJ Default MJ Filter MJ Filter Forced
>>
>> -------------------------------------------------------------------------------------
>> 10 33.46 7.41 6529 22638 21949 23160
>> 20 37.27 14.85 6483 22290 21928 21930
>> 30 41.32 22.25 6395 22374 20718 20794
>> 40 45.67 29.7 6272 21969 19449 19410
>> 50 50.41 37.1 6210 21412 18222 18224
>> 60 55.64 44.51 6052 21108 17060 17018
>> 70 61.59 51.98 5947 21020 15682 15737
>> 80 68.64 59.36 5761 20812 14411 14437
>> 90 77.83 66.86 5701 20585 13171 13200
>> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
>> Tables of 10M Rows.
>>
>> Attached you can find figures of the same performance test and a SQL
>> script
>> to reproduce the performance test.
>>
>> The first thing to notice is that Hash Join generally is the most
>> efficient join strategy. This is because Hash Join is better at
>> dealing with small tables, and our size of 10 million is still small
>> enough where Hash Join outperforms the other join strategies. Future
>> experiments can investigate using much larger tables.
>>
>> However, comparing just within the different Merge Join variants, we
>> see that using the bloom filter greatly improves performance.
>> Intuitively, all of these execution times follow linear paths.
>> Comparing forced filtering versus default, we can see that the default
>> Merge Join outperforms Merge Join with filtering at low filter rates,
>> but after about 20% TFR, the Merge Join with filtering outperforms
>> default Merge Join. This makes intuitive sense, as there are some
>> fixed costs associated with building and checking with the bloom
>> filter. In the worst case, at only 10% TFR, the bloom filter makes
>> Merge Join less than 5% slower. However, in the best case, at 90% TFR,
>> the bloom filter improves Merge Join by 36%.
>>
>> Based on the results of the above experiments, we came up with a
>> linear equation for the performance ratio for using the filter
>> pushdown from the actual filtering rate. Based on the numbers
>> presented in the figure, this is the equation:
>>
>> T_filter / T_no_filter = 1 / (0.83 * estimated filtering rate + 0.863)
>>
>> For example, this means that with an estimated filtering rate of 0.4,
>> the execution time of merge join is estimated to be improved by 16.3%.
>> Note that the estimated filtering rate is used in the equation, not
>> the theoretical filtering rate or the actual filtering rate because it
>> is what we have during planning. In practice the estimated filtering
>> rate isn’t usually accurate. In fact, the estimated filtering rate can
>> differ from the theoretical filtering rate by as much as 17% in our
>> experiments. One way to mitigate the power loss of bloom filter caused
>> by inaccurate estimated filtering rate is to adaptively turn it off at
>> execution time, this is yet to be implemented.
>>
>> Here is a list of tasks we plan to work on in order to improve this patch:
>> 1. More regression testing to guarantee correctness.
>> 2. More performance testing involving larger tables and complicated query
>> plans.
>> 3. Improve the cost model.
>> 4. Explore runtime tuning such as making the bloom filter checking
>> adaptive.
>> 5. Currently, only the best single join key is used for building the
>> Bloom filter. However, if there are several keys and we know that
>> their distributions are somewhat disjoint, we could leverage this fact
>> and use multiple keys for the bloom filter.
>> 6. Currently, Bloom filter pushdown is only implemented for SeqScan
>> nodes. However, it would be possible to allow push down to other types
>> of scan nodes.
>> 7. Explore if the Bloom filter could be pushed down through a foreign
>> scan when the foreign server is capable of handling it – which could
>> be made true for postgres_fdw.
>> 8. Better explain command on the usage of bloom filters.
>>
>> This patch set is prepared by Marcus Ma, Lyu Pan and myself. Feedback
>> is appreciated.
>>
>> With Regards,
>> Zheng Li
>> Amazon RDS/Aurora for PostgreSQL
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/c902844d-837f-5f63-ced3-9f7fd222f175%402ndquadrant.com
>
>
> Hi,
> In the header of patch 1:
>
> In this prototype, the cost model is based on an assumption that there is
> a linear relationship between the performance gain from using a semijoin
> filter and the estimated filtering rate:
> % improvement to Merge Join cost = 0.83 * estimated filtering rate - 0.137.
>
> How were the coefficients (0.83 and 0.137) determined ?
> I guess they were based on the results of running certain workload.
>
> Cheers
>
Hi,
For patch 1:
+bool enable_mergejoin_semijoin_filter;
+bool force_mergejoin_semijoin_filter;
How would (enable_mergejoin_semijoin_filter = off,
force_mergejoin_semijoin_filter = on) be interpreted ?
Have you considered using one GUC which has three values: off, enabled,
forced ?
+ mergeclauses_for_sjf = get_actual_clauses(path->path_mergeclauses);
+ mergeclauses_for_sjf = get_switched_clauses(path->path_mergeclauses,
+
path->jpath.outerjoinpath->parent->relids);
mergeclauses_for_sjf is assigned twice and I don't see mergeclauses_for_sjf
being reference in the call to get_switched_clauses().
Is this intentional ?
+ /* want at least 1000 rows_filtered to avoid any nasty edge
cases */
+ if (force_mergejoin_semijoin_filter || (filteringRate >= 0.35
&& rows_filtered > 1000))
The above condition is narrower compared to the enclosing condition.
Since there is no else block for the second if block, please merge the two
if statements.
+ int best_filter_clause;
Normally I would think `clause` is represented by List*. But
best_filter_clause is an int. Please use another variable name so that
there is less chance of confusion.
For evaluate_semijoin_filtering_rate():
+ double best_sj_selectivity = 1.01;
How was 1.01 determined ?
+ debug_sj1("SJPD: start evaluate_semijoin_filtering_rate");
There are debug statements in the methods.
It would be better to remove them in the next patch set.
Cheers
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2022-10-01 04:23:05 | Re: pgsql: Avoid improbable PANIC during heap_update. |
Previous Message | Xiaoran Wang | 2022-10-01 04:02:09 | postgres_fdw: dead lock in a same transaction when postgres_fdw server is lookback |