Estimate of the inner_rows

From: 陈雁飞 <postgresql_2016(at)163(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Estimate of the inner_rows
Date: 2024-09-08 08:21:12
Message-ID: 408f9706.17a2.191d0b90567.Coremail.postgresql_2016@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi
I encounter an query plan problem like as the following. It's contain two nodes which assume the result is 1 and 7, but however, the last result is 7418. And the actual result is just 1, but because of the result is too big, which will affect the following join methods. And I've analyze the reason, but I think we can do bettwer.

postgres=# explain select * from test t1 left join test t2 on t1.b = t2.b and t2.c = 10 where t1.a = 1;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop Left Join (cost=0.85..48.16 rows=7418 width=24)
-> Index Scan using a_idx on test t1 (cost=0.43..8.45 rows=1 width=12)
Index Cond: (a = 1)
-> Index Scan using b_idx on test t2 (cost=0.43..39.64 rows=7 width=12)
Index Cond: (b = t1.b)
Filter: (c = 10)
(6 rows)

postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
c | integer | | |
Indexes:
"a_idx" btree (a)
"b_idx" btree (b)

Throughing the source code, I know it how to get this result.
Firstly, the result 7 is assume fron the condition t1.b = t2.b and t2.c = 10, in the function clauselist_selectivity_ext compute the selectivity, and the result is:
t2.b selc * t2.c = 10 selc * ntuples
(1/134830) * (1002795/1201000) * 1201000 = 7

Secondly, when compute the join selec, the compute function is eqjoinsel,and the result function is calc_joinrel_size_estimate
case JOIN_LEFT:
nrows = outer_rows * inner_rows * fkselec * jselec;
if (nrows < outer_rows)
nrows = outer_rows;
nrows *= pselec;
break;
outer_rows is 1, inner_rows is 1002795, which is the result the estimate result of t2.c = 10, while not the 7.
So, through the analyze, I think the reason is the estimate result of inner_rows, now we just consider the condition t2.c = 10, not the condition t1.b = t2,b and t2.c = 10.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2024-09-08 16:50:34 Re: Estimate of the inner_rows
Previous Message Pavel Stehule 2024-09-03 11:41:31 Re: proposal: schema variables