From: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alexey Ermakov <alexey(dot)ermakov(at)dataegret(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE |
Date: | 2018-09-25 11:24:39 |
Message-ID: | CAK-MWwR_G0NLLB9TwRhs+3cHDa9zZnYg7AgpAfeYJn_hgpZBsQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, Apr 17, 2018 at 5:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> > I'm wondering how planner estimates number of rows in that case:
>
> See eqjoinsel_semi, particularly the change in behavior when it thinks
> nd2 is or is not a default estimate.
>
> Given the lack of statistics about the output of the WITH clause,
> it's hard to see how we'd ever get trustworthy estimates here.
> I think the fact that your first example yields an accurate
> estimate is mostly luck.
>
> regards, tom lane
>
>
There are similar issue without CTE which look pretty weird:
Good case with LIMIT 199 and adequate estimation:
hh=# explain SELECT * FROM resume WHERE resume_id IN (select id from
generate_series(1, 1000) gs(id) LIMIT 199);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.53..108.98 rows=199 width=519)
-> Unique (cost=21.42..21.62 rows=199 width=4)
-> Sort (cost=21.42..21.52 rows=199 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..9.95 rows=199 width=4)
-> Function Scan on generate_series gs
(cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1
width=519)
Index Cond: (resume_id = gs.id)
Very bad case with awful estimation (only difference LIMIT 200 vs LIMIT
199):
explain SELECT * FROM resume WHERE resume_id IN (select id from
generate_series(1, 1000) gs(id) LIMIT 200);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.64..109.53 rows=45860504 width=519)
-> Unique (cost=21.53..21.73 rows=200 width=4)
-> Sort (cost=21.53..21.63 rows=200 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..10.00 rows=200 width=4)
-> Function Scan on generate_series gs
(cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1
width=519)
Index Cond: (resume_id = gs.id)
It's not a problem by itself but once you start using this query with more
joined tables - a lot bad things happens because 5 orders of magnitude
error in selectivity estimation.
PS: in reality it forces us to use not more than 199 LIMIT in complex joins
for batch operations or the database start generate funny plans.
Regards,
Maxim
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2018-09-25 12:06:35 | Re: BUG #15238: Sequence owner not updated when owning table is foreign |
Previous Message | PG Bug reporting form | 2018-09-25 10:54:23 | BUG #15399: pgAdmin 4 closes all windows instead of one |
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2018-09-25 11:39:59 | Re: Segfault when creating partition with a primary key and sql_drop trigger exists |
Previous Message | Iwata, Aya | 2018-09-25 09:56:15 | RE: libpq debug log |