From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use extended statistics to estimate (Var op Var) clauses |
Date: | 2021-08-28 16:30:12 |
Message-ID: | CCD1B8BF-3376-454C-A2F3-9AB57A401A57@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Aug 28, 2021, at 6:52 AM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Part 0003 fixes handling of those clauses so that we don't treat them as simple, but it does that by tweaking statext_is_compatible_clause(), as suggested by Dean.
Function examine_opclause_args() doesn't set issimple to anything in the IsA(rightop, Const) case, but assigns *issimplep = issimple at the bottom. The compiler is not complaining about using a possibly uninitialized variable, but if I change the "return true" on the very next line to "return issimple", the compiler complains quite loudly.
Some functions define bool *issimple, others bool *issimplep and bool issimple. You might want to standardize the naming.
It's difficult to know what "simple" means in extended_stats.c. There is no file-global comment explaining the concept, and functions like compare_scalars_simple don't have correlates named compare_scalars_complex or such, so the reader cannot infer by comparison what the difference might be between a "simple" case and some non-"simple" case. The functions' issimple (or issimplep) argument are undocumented.
There is a comment:
/*
* statext_mcv_clauselist_selectivity
* Estimate clauses using the best multi-column statistics.
....
*
* - simple selectivity: Computed without extended statistics, i.e. as if the
* columns/clauses were independent.
*
....
*/
but it takes a while to find if you search for "issimple".
In both scalarineqsel_wrapper() and eqsel_internal(), the call to matching_restriction_variables() should usually return false, since comparing a variable to itself is an unusual case. The next call is to get_restriction_variable(), which repeats the work of examining the left and right variables. So in almost all cases, after throwing away the results of:
examine_variable(root, left, varRelid, &ldata);
examine_variable(root, right, varRelid, &rdata);
performed in matching_restriction_variables(), we'll do exactly the same work again (with one variable named differently) in get_restriction_variable():
examine_variable(root, left, varRelid, vardata);
examine_variable(root, right, varRelid, &rdata);
That'd be fine if example_variable() were a cheap function, but it appears not to be. Do you think you could save the results rather than recomputing them? It's a little messy, since these are the only two functions out of about ten which follow this pattern, so you'd have to pass NULLs into get_restriction_variable() from the other eight callers, but it still looks like that would be a win.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Zhihong Yu | 2021-08-28 17:18:06 | Re: Use extended statistics to estimate (Var op Var) clauses |
Previous Message | Tom Lane | 2021-08-28 14:28:50 | Re: Can we get rid of repeated queries from pg_dump? |