Re: BUG #18643: EXPLAIN estimated rows mismatch

From: ming wei tan <mingwei(dot)tc(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18643: EXPLAIN estimated rows mismatch
Date: 2024-10-02 02:04:36
Message-ID: CAALvCkZz+7hpdsg-cSV93QK4jwVw1z7E5CS3+w-mmCp__Tpu0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 1/10/2024 18:43, Tom Lane wrote:
> In any case, in this toy example that lacks an ANALYZE step,
> the selectivity estimates are mostly going to be garbage.

Thanks for the replies. I'm just checking if a bug is present here
is a bug. Even with ANALYZE, the first EXPLAIN estimates more rows
compared to the second, even though the second WHERE clause is
less restrictive.

ANALYZE;

ANALYZE

EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0);
QUERY PLAN
--------------------------------------------------
Seq Scan on t2 (cost=0.00..1.02 rows=2 width=4)
Filter: (c0 IS NOT NULL)
(2 rows)

EXPLAIN SELECT t2.c0 FROM t2 WHERE (t2.c0 IN (t2.c0)) OR (t2.c0 > 4);
QUERY PLAN
--------------------------------------------------
Seq Scan on t2 (cost=0.00..1.03 rows=1 width=4)
Filter: ((c0 = c0) OR (c0 > 4))
(2 rows)

DROP DATABASE IF EXISTS database4;
CREATE DATABASE database4 WITH ENCODING 'UTF8' TEMPLATE template0;
\c database4;
CREATE TABLE t2(c0 int);
INSERT INTO t2(c0) VALUES(1);
INSERT INTO t2(c0) VALUES(2);
ANALYZE;
EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0);
EXPLAIN SELECT t2.c0 FROM t2 WHERE (t2.c0 IN (t2.c0)) OR (t2.c0 > 4);

Regards,
Ming Wei

On Wed, 2 Oct 2024 at 08:35, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:

> On 1/10/2024 18:43, Tom Lane wrote:
> > PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> >> Given predicate A and B, it is expected that size (SELECT X where A) <=
> >> size (SELECT X WHERE A or B)
> >> However, `EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0)` returns
> >> rows=2537
> >
> > I don't see any particular bug here. If you look closely at the
> > EXPLAIN output, you'll see that "t2.c0 IN (t2.c0)" is transformed
> > to "c0 IS NOT NULL" --- but only if it's at top level. So we're
> > estimating selectivities for two quite different conditions in
> > this example.
> >
> > The NOT NULL bit happens because a top-level equality clause
> > is transformed into an "EquivalenceClass", and then when we
> > notice the class has only one member, we prefer to spit out
> > "x IS NOT NULL" rather than "x = x". That has the same effect
> > (at top level of WHERE, anyway) and tends to be estimated
> > more accurately.
> I think their question was about why 'x IN (x)' transforms differently
> at the top and inside the OR clause. It is pretty typical question.
>
> --
> regards, Andrei Lepikhov
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2024-10-02 04:16:31 Re: Linux OOM killer
Previous Message Andrei Lepikhov 2024-10-02 00:44:35 Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker