From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] <> join selectivity estimate question |
Date: | 2017-12-03 17:40:16 |
Message-ID: | 11738.1512322816@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> So, in that plan we saw anti-join estimate 1 row but really there were
> 13462. If you remove most of Q21 and keep just the anti-join between
> l1 and l3, then you try removing different quals, you can see the the
> problem is not the <> qual:
> select count(*)
> from lineitem l1
> where not exists (
> select *
> from lineitem l3
> where l3.l_orderkey = l1.l_orderkey
> and l3.l_suppkey <> l1.l_suppkey
> and l3.l_receiptdate > l3.l_commitdate
> )
> => estimate=1 actual=8998304
ISTM this is basically another variant of ye olde column correlation
problem. That is, we know there's always going to be an antijoin match
for the l_orderkey equality condition, and that there's always going to
be matches for the l_suppkey inequality, but what we don't know is that
l_suppkey is correlated with l_orderkey so that the two conditions aren't
satisfied at the same time. The same thing is happening on a smaller
scale with the receiptdate/commitdate comparison.
I wonder whether the extended stats machinery could be brought to bear
on this problem.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-12-03 19:56:33 | Re: [HACKERS] postgres_fdw bug in 9.6 |
Previous Message | MauMau | 2017-12-03 11:42:12 | Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database? |