From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Additional improvements to extended statistics |
Date: | 2020-03-09 00:01:57 |
Message-ID: | 20200309000157.ig5tcrynvaqu4ixd@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 08, 2020 at 07:17:10PM +0000, Dean Rasheed wrote:
>On Fri, 6 Mar 2020 at 12:58, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> Here is a rebased version of this patch series. I've polished the first
>> two parts a bit - estimation of OR clauses and (Var op Var) clauses.
>>
>
>Hi,
>
>I've been looking over the first patch (OR list support). It mostly
>looks reasonable to me, except there's a problem with the way
>statext_mcv_clauselist_selectivity() combines multiple stat_sel values
>into the final result -- in the OR case, it needs to start with sel =
>0, and then apply the OR formula to factor in each new estimate. I.e.,
>this isn't right for an OR list:
>
> /* Factor the estimate from this MCV to the oveall estimate. */
> sel *= stat_sel;
>
>(Oh and there's a typo in that comment: s/oveall/overall/).
>
>For example, with the regression test data, this isn't estimated well:
>
> SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;
>
>Similarly, if no extended stats can be applied it needs to return 0
>not 1, for example this query on the test data:
>
> SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;
>
Ah, right. Thanks for noticing this. Attaches is an updated patch series
with parts 0002 and 0003 adding tests demonstrating the issue and then
fixing it (both shall be merged to 0001).
>It might also be worth adding a couple more regression test cases like these.
Agreed, 0002 adds a couple of relevant tests.
Incidentally, I've been working on improving test coverage for extended
stats over the past few days (it has ~80% lines covered, which is not
bad nor great). I haven't submitted that to hackers yet, because it's
mostly mechanical and it's would interfere with the two existing threads
about extended stats ...
Speaking of which, would you take a look at [1]? I think supporting SAOP
is fine, but I wonder if you agree with my conclusion we can't really
support inclusion @> as explained in [2].
[1] https://www.postgresql.org/message-id/flat/13902317(dot)Eha0YfKkKy(at)pierred-pdoc
[2] https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-03-09 00:06:21 | Re: Additional improvements to extended statistics |
Previous Message | Jesse Zhang | 2020-03-08 23:44:21 | Re: Use compiler intrinsics for bit ops in hash |