From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Add support for (Var op Var) clause in extended MCV statistics |
Date: | 2024-08-12 10:42:07 |
Message-ID: | 46d09d66-1692-4a12-a106-06981a47cb28@tantorlabs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
I'd like to submit a patch that improves the estimated rows for queries
containing (Var op Var) clauses by applying extended MCV statistics.
*New functions:*
* mcv_clauselist_selectivity_var_op_var() - calculates the selectivity
for (Var op Var) clauses.
* is_opclause_var_op_var() - Checks whether a clause is of the (Var op
Var) form.
*Implementation Details:*
* A new 'if' statement was added to the 'clause_selectivity_ext()'
function to handle (Var op Var) clauses. This allows the process to
locate matching MCV extended statistics and calculate selectivity
using the newly introduced function.
* Additionally, I added 'if' statement
in statext_is_compatible_clause_internal() function to determine
which columns are included in the clause, find matching extended
statistics, and then calculate selectivity through the new function.
I did the same in mcv_get_match_bitmap() to check what values are
true for (Var op Var).
* To support this, I created a new enum type to differentiate between
OR/AND and (Var op Var) clauses.
*Examples:*
create table t (a int, b int);
insert into t select mod(i,10), mod(i,10)+1 from
generate_series(1,100000) s(i);
analyze t;
explain select * from t where a < b;
`
Estimated: 33333
Actual: 100000
explain select * from t where a > b;
`
Estimated: 33333
Actual: 100000
create statistics s (mcv) on a,b from t;
analyze t;
explain select * from t where a < b;
`
Estimated without patch: 33333
Estimated with patch: 100000
Actual: 100000
explain select * from t where a > b;
`
Estimated without patch: 33333
Estimated with patch: 100000
Actual: 100000
If you want to see more examples, see regress tests in the patch.
*Previous thread:*
This feature was originally developed two years ago in [1], and at that
time, the approach was almost the same. My implementation uses dedicated
functions and 'if' statements directly for better readability and
maintainability. Additionally, there was a bug in the previous approach
that has been resolved with my patch. Here’s an example of the bug and
its fix:
CREATE TABLE foo (a int, b int);
INSERT INTO foo SELECT x/10+1, x FROM generate_series(1,10000) g(x);
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
Estimated: 18
Actual: 9
CREATE STATISTICS foo_s (mcv) ON a,b FROM foo;
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
Estimated previous patch: 18
Estimated current patch: 9
Actual: 9
[1]:
https://www.postgresql.org/message-id/flat/9e0a12e0-c05f-b193-ed3d-fe88bc1e5fe1%40enterprisedb.com
I look forward to any feedback or suggestions from the community.
Best regars,
Ilia Evdokimov
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
v1-Add-support-for-Var-op-Var-clause-in-extended-MCV-st.patch | text/x-patch | 48.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2024-08-12 10:46:22 | Re: ECPG cleanup and fix for clang compile-time problem |
Previous Message | Bertrand Drouvot | 2024-08-12 10:35:44 | Re: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state |