From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Bertrand Guillaumin <bertrand(dot)guillaumin(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16759: Estimation of the planner is wrong for hash join |
Date: | 2020-12-18 01:14:54 |
Message-ID: | 18e3e4e6-2816-77fa-84c2-b85448fbe6ce@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/17/20 6:36 PM, Bertrand Guillaumin wrote:
> Hello,
> I think I just made a test that shows that even with MCV on both sides
> the estimated selectivity can be pretty wrong.
> test=# create table test_bug_hash2 as SELECT mod(id,500) as id , case
> when id<=500 then 1 else 2 end parent_id, null::text as attrib from
> (select generate_series(1,1000) as id) alias0;
> SELECT 1000
> test=# update test_bug_hash2 set attrib='BEL' where id=2;
> UPDATE 2
> test=# analyze test_bug_hash2;
> ANALYZE
> test=# explain select * from test_bug_hash2 a, test_bug_hash2 b where
> a.parent_id=b.id <http://b.id> and b.attrib='BEL';
> QUERY PLAN
> ------------------------------------------------------------------------------
> Hash Join (cost=17.52..37.56 rows=4 width=24)
> Hash Cond: (a.parent_id = b.id <http://b.id>)
> -> Seq Scan on test_bug_hash2 a (cost=0.00..15.00 rows=1000 width=12)
> -> Hash (cost=17.50..17.50 rows=2 width=12)
> -> Seq Scan on test_bug_hash2 b (cost=0.00..17.50 rows=2
> width=12)
> Filter: (attrib = 'BEL'::text)
>
> test=# select count(*) from test_bug_hash2 a, test_bug_hash2 b where
> a.parent_id=b.id <http://b.id> and b.attrib='BEL';
> count
> -------
> 1000
>
> I won't copy paste the pg_stats lines but most_common_vals and
> most_common_freq have values for all three columns.
>
> I'm not a programmer but I've looked into the code of the planner a
> little bit and it seems you try to estimate the selectivity of a join in
> itself without any regards to the filters that can be applied on any
> side of the join ( if I understood correctly).
> I think that it's ultimately where the problem lies, if the filter is
> not too important the selectivity stays more or less the same but with
> filters like the one in this query the selectivity of the join can
> change a lot so in the end you get estimations which can be totally wrong.
>
>
Right. The problem is that the two columns are correlated, thanks to how
you set the attrib value only for id=2, but the join estimation code is
oblivious to that.
Perhaps the multi-column/extended stats might allow us to improve this,
but the code has not been written yet.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-12-18 07:20:36 | BUG #16778: Problem: nothing provides libarmadillo.so.9()(64bit) needed by gdal30-libs-3.0.4-2.sles12.x86_64 |
Previous Message | Bertrand Guillaumin | 2020-12-17 17:36:45 | Re: BUG #16759: Estimation of the planner is wrong for hash join |