From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Using multiple extended statistics for estimates |
Date: | 2019-11-14 21:45:41 |
Message-ID: | 20191114214541.3ssauovlhh3vsasr@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 14, 2019 at 03:16:04PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> For the case with equal Const values that should be mostly obvious, i.e.
>> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".
>
>> The case with different Const values is harder, unfortunately. It might
>> seem obvious that "a=1 AND a=2" means there are no matching rows, but
>> that heavily relies on the semantics of the equality operator. And we
>> can't simply compare the Const values either, I'm afraid, because there
>> are cases with cross-type operators like
>> a = 1::int AND a = 1.0::numeric
>> where the Consts are of different type, yet both conditions can be true.
>
>FWIW, there's code in predtest.c to handle exactly that, at least for
>types sharing a btree opfamily. Whether it's worth applying that logic
>here is unclear, but note that we've had the ability to recognize
>redundant and contradictory clauses for a long time:
>
>regression=# explain select * from tenk1 where two = 1;
> QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
> Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 1::bigint;
> QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
> Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
> QUERY PLAN
>---------------------------------------------------------------
> Result (cost=0.00..470.00 rows=1 width=244)
> One-Time Filter: false
> -> Seq Scan on tenk1 (cost=0.00..470.00 rows=1 width=244)
> Filter: (two = 1)
>(4 rows)
>
>It falls down on
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
> QUERY PLAN
>-----------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..520.00 rows=25 width=244)
> Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
>(2 rows)
>
>because numeric isn't in the same opfamily, so these clauses can't be
>compared easily.
>
> regards, tom lane
Yeah, and this logic still works - the redundant clauses won't even get
to the selectivity estimation, I think. So maybe the comment is not
quite necessary, because the problem does not even exist ...
Maybe we could do something about the cases that predtest.c can't solve,
but it's not clear if we can be much smarter for types with different
opfamilies.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-11-14 21:51:25 | Re: Using multiple extended statistics for estimates |
Previous Message | Tom Lane | 2019-11-14 21:36:54 | Re: Missing dependency tracking for TableFunc nodes |