BUG #16759: Estimation of the planner is wrong for hash join

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: bertrand(dot)guillaumin(at)gmail(dot)com
Subject: BUG #16759: Estimation of the planner is wrong for hash join
Date: 2020-12-02 10:43:50
Message-ID: 16759-f0980add03d8f152@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16759
Logged by: Bertrand Guillaumin
Email address: bertrand(dot)guillaumin(at)gmail(dot)com
PostgreSQL version: 11.8
Operating system: Linux
Description:

The following query estimated number of lines returned is 1 while it should
be around 67 or more :
explain analyze select * from enterprise where parent_enterprise in (select
enterprise_id from enterprise par where global_attribute15 = 'BEL');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Hash Join (cost=95.60..191.33 rows=1 width=977) (actual time=0.422..0.896
rows=56 loops=1)
Hash Cond: (enterprise.parent_enterprise = par.enterprise_id)
-> Seq Scan on enterprise (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.095 rows=1087 loops=1)
-> Hash (cost=95.59..95.59 rows=1 width=5) (actual time=0.397..0.398
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5)
(actual time=0.251..0.394 rows=1 loops=1)
Filter: (global_attribute15 = 'BEL'::text)
Rows Removed by Filter: 1086

The same issue with a normal join :
explain analyze select * from enterprise ent1, enterprise ent2 where
ent1.parent_enterprise=ent2.enterprise_id and ent2.global_attribute15 =
'BEL';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=95.60..191.33 rows=1 width=1954) (actual time=0.444..0.954
rows=56 loops=1)
Hash Cond: (ent1.parent_enterprise = ent2.enterprise_id)
-> Seq Scan on enterprise ent1 (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.104 rows=1087 loops=1)
-> Hash (cost=95.59..95.59 rows=1 width=977) (actual time=0.416..0.416
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on enterprise ent2 (cost=0.00..95.59 rows=1
width=977) (actual time=0.252..0.410 rows=1 loops=1)
Filter: (global_attribute15 = 'BEL'::text)
Rows Removed by Filter: 1086

The statistics for parent_enterprise :
select * from pg_stats where tablename='enterprise' and
attname='parent_enterprise';
schemaname | tablename | attname | inherited | null_frac |
avg_width | n_distinct | most_common_vals
|
most_common_freqs |
histogram_bounds | correlation | most_common_elems |
most_common_elem_freqs | elem_count_histogram
------------+------------+-------------------+-----------+------------+-----------+------------+-----------------------------------------------------+-------------------------------------------------
-------------------------------------------------------------------+-------------------------+-------------+-------------------+------------------------+----------------------
xxxxxxx | enterprise | parent_enterprise | f | 0.00551978 |
4 | 16 | {48,682,6162,6639,6448,46,6630,6796,6553,6812,6854} |
{0.551058,0.184913,0.0818767,0.0772769,0.0515179
,0.0137994,0.00919963,0.00827967,0.00643974,0.00367985,0.00183993} |
{0,6831,6853,6904,6917} | 0.755042 | |
|
(1 row)

The right estimation when using = (not an option here):
explain select * from enterprise where parent_enterprise = (select
enterprise_id from enterprise par where global_attribute15 = 'BEL' limit
1);
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on enterprise (cost=95.59..191.18 rows=68 width=977)
Filter: (parent_enterprise = $0)
InitPlan 1 (returns $0)
-> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5)
Filter: (global_attribute15 = 'BEL'::text)

The wrong estimation leads to issues in bigger queries.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dave Page 2020-12-02 13:15:44 Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons
Previous Message Zhiyu ZY13 Xu 2020-12-02 06:50:30 答复: [External] Re: pgadmin--pgagent---the process hang by unknow reasons