From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Additional improvements to extended statistics |
Date: | 2020-03-18 19:31:28 |
Message-ID: | 20200318193128.53soruotqdqnlaoy@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 15, 2020 at 12:37:37PM +0000, Dean Rasheed wrote:
>On Sun, 15 Mar 2020 at 00:08, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote:
>> >
>> >Attached is a patch series rebased on top of the current master, after
>> >committing the ScalarArrayOpExpr enhancements. I've updated the OR patch
>> >to get rid of the code duplication, and barring objections I'll get it
>> >committed shortly together with the two parts improving test coverage.
>>
>> I've pushed the two patches improving test coverage for functional
>> dependencies and MCV lists, which seems mostly non-controversial. I'll
>> wait a bit more with the two patches actually changing behavior (rebased
>> version attached, to keep cputube happy).
>>
>
>Patch 0001 looks to be mostly ready. Just a couple of final comments:
>
>+ if (is_or)
>+ simple_sel = clauselist_selectivity_simple_or(root,
>stat_clauses, varRelid,
>+ jointype,
>sjinfo, NULL, 1.0);
>+ else
>
>Surely that should be passing 0.0 as the final argument, otherwise it
>will always just return simple_sel = 1.0.
>
>
>+ *
>+ * XXX We can't multiply with current value, because for OR clauses
>+ * we start with 0.0, so we simply assign to s1 directly.
>+ */
>+ s = statext_clauselist_selectivity(root, clauses, varRelid,
>+ jointype, sjinfo, rel,
>+ &estimatedclauses, true);
>
>That final part of the comment is no longer relevant (variable s1 no
>longer exists). Probably it could now just be deleted, since I think
>there are sufficient comments elsewhere to explain what's going on.
>
>Otherwise it looks good, and I think this will lead to some very
>worthwhile improvements.
>
Attached is a rebased patch series, addressing both those issues.
I've been wondering why none of the regression tests failed because of
the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
make the tests stable, all the MCV lists we use are "perfect" i.e. it
represents 100% of the data. But this selectivity is used to compute
selectivity only for the part not represented by the MCV list, i.e. it's
not really used. I suppose we could add a test that would use larger
MCV item, but I'm afraid that'd be inherently unstable :-(
Another thing I was thinking about is the changes to the API. We need to
pass information whether the clauses are connected by AND or OR to a
number of places, and 0001 does that in two ways. For some functions it
adds a new parameter (called is_or), and for other functiosn it creates
a new copy of a function. So for example
- statext_mcv_clauselist_selectivity
- statext_clauselist_selectivity
got the new flag, while e.g. clauselist_selectivity gets a new "copy"
sibling called clauselist_selectivity_or.
There were two reasons for not using flag. First, clauselist_selectivity
and similar functions have to do very different stuff for these two
cases, so it'd be just one huge if/else block. Second, minimizing
breakage of third-party code - pretty much all the extensions I've seen
only work with AND clauses, and call clauselist_selectivity. Adding a
flag would break that code. (Also, there's a bit of laziness, because
this was the simplest thing to do during development.)
But I wonder if that's sufficient reason - maybe we should just add the
flag in all cases. It might break some code, but the fix is trivial (add
a false there).
Opinions?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Improve-estimation-of-OR-clauses-using-exte-20200318.patch | text/plain | 20.6 KB |
0002-Support-clauses-of-the-form-Var-op-Var-20200318.patch | text/plain | 17.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-03-18 19:50:38 | Re: Minimal logical decoding on standbys |
Previous Message | Julien Rouhaud | 2020-03-18 19:24:00 | Re: WAL usage calculation patch |