Re: Using multiple extended statistics for estimates

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using multiple extended statistics for estimates
Date: 2019-11-14 21:45:41
Message-ID: 20191114214541.3ssauovlhh3vsasr@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 14, 2019 at 03:16:04PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> For the case with equal Const values that should be mostly obvious, i.e.
>> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".
>
>> The case with different Const values is harder, unfortunately. It might
>> seem obvious that "a=1 AND a=2" means there are no matching rows, but
>> that heavily relies on the semantics of the equality operator. And we
>> can't simply compare the Const values either, I'm afraid, because there
>> are cases with cross-type operators like
>> a = 1::int AND a = 1.0::numeric
>> where the Consts are of different type, yet both conditions can be true.
>
>FWIW, there's code in predtest.c to handle exactly that, at least for
>types sharing a btree opfamily. Whether it's worth applying that logic
>here is unclear, but note that we've had the ability to recognize
>redundant and contradictory clauses for a long time:
>
>regression=# explain select * from tenk1 where two = 1;
> QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
> Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 1::bigint;
> QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
> Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
> QUERY PLAN
>---------------------------------------------------------------
> Result (cost=0.00..470.00 rows=1 width=244)
> One-Time Filter: false
> -> Seq Scan on tenk1 (cost=0.00..470.00 rows=1 width=244)
> Filter: (two = 1)
>(4 rows)
>
>It falls down on
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
> QUERY PLAN
>-----------------------------------------------------------
> Seq Scan on tenk1 (cost=0.00..520.00 rows=25 width=244)
> Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
>(2 rows)
>
>because numeric isn't in the same opfamily, so these clauses can't be
>compared easily.
>
> regards, tom lane

Yeah, and this logic still works - the redundant clauses won't even get
to the selectivity estimation, I think. So maybe the comment is not
quite necessary, because the problem does not even exist ...

Maybe we could do something about the cases that predtest.c can't solve,
but it's not clear if we can be much smarter for types with different
opfamilies.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-11-14 21:51:25 Re: Using multiple extended statistics for estimates
Previous Message Tom Lane 2019-11-14 21:36:54 Re: Missing dependency tracking for TableFunc nodes