From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | alexey(dot)ermakov(at)dataegret(dot)com |
Subject: | BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE |
Date: | 2018-04-17 09:40:50 |
Message-ID: | 152395805004.19366.3107109716821067806@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
The following bug has been logged on the website:
Bug reference: 15160
Logged by: Alexey Ermakov
Email address: alexey(dot)ermakov(at)dataegret(dot)com
PostgreSQL version: 10.3
Operating system: Linux
Description:
Hello,
I'm wondering how planner estimates number of rows in that case:
create table test_in (id int primary key);
insert into test_in select id from generate_series(1,1000000) gs(id);
analyze test_in;
explain analyze with ids as (select id from generate_series(1,1000) gs(id)
limit 200)
select * from test_in where id in (select id from ids);
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.93..139.79 rows=200 width=4) (actual time=0.129..0.400
rows=200 loops=1)
CTE ids
-> Limit (cost=0.00..2.00 rows=200 width=4) (actual time=0.050..0.066
rows=200 loops=1)
-> Function Scan on generate_series gs (cost=0.00..10.00
rows=1000 width=4) (actual time=0.050..0.057 rows=200 loops=1)
-> HashAggregate (cost=4.50..6.50 rows=200 width=4) (actual
time=0.117..0.133 rows=200 loops=1)
Group Key: ids.id
-> CTE Scan on ids (cost=0.00..4.00 rows=200 width=4) (actual
time=0.051..0.086 rows=200 loops=1)
-> Index Only Scan using test_in_pkey on test_in (cost=0.42..0.66
rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=200)
Index Cond: (id = ids.id)
Heap Fetches: 200
Planning time: 0.128 ms
Execution time: 0.434 ms
explain analyze with ids as (select id from generate_series(1,1000) gs(id)
limit 201)
select * from test_in where id in (select id from ids);
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.96..132.78 rows=500000 width=4) (actual
time=0.119..0.389 rows=201 loops=1)
CTE ids
-> Limit (cost=0.00..2.01 rows=201 width=4) (actual time=0.048..0.064
rows=201 loops=1)
-> Function Scan on generate_series gs (cost=0.00..10.00
rows=1000 width=4) (actual time=0.048..0.056 rows=201 loops=1)
-> HashAggregate (cost=4.52..6.52 rows=200 width=4) (actual
time=0.113..0.130 rows=201 loops=1)
Group Key: ids.id
-> CTE Scan on ids (cost=0.00..4.02 rows=201 width=4) (actual
time=0.049..0.083 rows=201 loops=1)
-> Index Only Scan using test_in_pkey on test_in (cost=0.42..0.66
rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=201)
Index Cond: (id = ids.id)
Heap Fetches: 201
Planning time: 0.068 ms
Execution time: 0.417 ms
please note that it first example we got correct estimate of total number of
rows - 200, but in last one where CTE returned 201 rows (instead of 200) we
estimate total number of rows as 500000 (half of the table test_in).
which is way off and could lead to non optimal plan and poor performance.
I have same estimate if I replace IN clause with equivalent EXISTS subquery;
normal join estimates number of rows fine (but it's not equivalent in
general case when table_in.id is not unique).
reproduced in 9.5, 9.6 and 10. interesting thing that in postgresql 10
threshold is 200 rows but in previous version it's 199.
I suspect selectivity 0.5 we somehow get inside
compute_semi_anti_join_factors function in costsize.c but I'm not sure.
Thanks,
Alexey Ermakov
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2018-04-17 12:11:21 | Re: Standby corruption after master is restarted |
Previous Message | Emre Hasegeli | 2018-04-17 08:55:22 | Re: Standby corruption after master is restarted |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2018-04-17 09:43:51 | Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled. |
Previous Message | Amit Langote | 2018-04-17 09:00:36 | Re: ON CONFLICT DO UPDATE for partitioned tables |