Re: Using multiple extended statistics for estimates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using multiple extended statistics for estimates
Date: 2019-11-14 20:16:04
Message-ID: 10082.1573762564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> For the case with equal Const values that should be mostly obvious, i.e.
> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".

> The case with different Const values is harder, unfortunately. It might
> seem obvious that "a=1 AND a=2" means there are no matching rows, but
> that heavily relies on the semantics of the equality operator. And we
> can't simply compare the Const values either, I'm afraid, because there
> are cases with cross-type operators like
> a = 1::int AND a = 1.0::numeric
> where the Consts are of different type, yet both conditions can be true.

FWIW, there's code in predtest.c to handle exactly that, at least for
types sharing a btree opfamily. Whether it's worth applying that logic
here is unclear, but note that we've had the ability to recognize
redundant and contradictory clauses for a long time:

regression=# explain select * from tenk1 where two = 1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
Filter: (two = 1)
(2 rows)

regression=# explain select * from tenk1 where two = 1 and two = 1::bigint;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..470.00 rows=5000 width=244)
Filter: (two = 1)
(2 rows)

regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
QUERY PLAN
---------------------------------------------------------------
Result (cost=0.00..470.00 rows=1 width=244)
One-Time Filter: false
-> Seq Scan on tenk1 (cost=0.00..470.00 rows=1 width=244)
Filter: (two = 1)
(4 rows)

It falls down on

regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..520.00 rows=25 width=244)
Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
(2 rows)

because numeric isn't in the same opfamily, so these clauses can't be
compared easily.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-11-14 20:16:08 Re: SKIP_LOCKED test causes random buildfarm failures
Previous Message Tomas Vondra 2019-11-14 20:04:20 Re: Using multiple extended statistics for estimates