From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | 陈雁飞 <postgresql_2016(at)163(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Estimate of the inner_rows |
Date: | 2024-09-09 03:30:42 |
Message-ID: | 3036077.1725852642@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
=?GBK?B?s8LR47fJ?= <postgresql_2016(at)163(dot)com> writes:
> 1¡¢The first problem is found in PG9.2 for our profuction version. But I test in the latest version.
You're still running 9.2 in production? That's ... inadvisable.
> 2¡¢The data is simply. The data in b has many distinct number, and the data in C=10 has too many numbers. I've dumped it in test.log in the email attachemt.
Thanks for the test data. I don't think there is actually anything
wrong here, or at least nothing readily improvable. I get this
for your original query:
postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b and t2.c = 10 where t1.a = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.85..16.90 rows=7567 width=24) (actual time=0.014..0.015 rows=1 loops=1)
-> Index Scan using a_idx on test t1 (cost=0.43..8.45 rows=1 width=12) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (a = 1)
-> Index Scan using b_idx on test t2 (cost=0.43..8.45 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=1)
Index Cond: (b = t1.b)
Filter: (c = 10)
Planning Time: 0.321 ms
Execution Time: 0.032 ms
(8 rows)
Slightly different from your estimate, but random sampling or a
different statistics-target setting would be enough to explain that.
It is *not* the t2.c = 10 condition that's resulting in the incorrect
estimate, because taking it out doesn't change things much:
postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b where t1.a = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.85..16.90 rows=9088 width=24) (actual time=0.014..0.015 rows=1 loops=1)
-> Index Scan using a_idx on test t1 (cost=0.43..8.45 rows=1 width=12) (actual time=0.008..0.009 rows=1 loops=1)
Index Cond: (a = 1)
-> Index Scan using b_idx on test t2 (cost=0.43..8.45 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=1)
Index Cond: (b = t1.b)
Planning Time: 0.312 ms
Execution Time: 0.031 ms
(7 rows)
Next, let's take out the t1.a = 1 condition:
postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.43..602605.00 rows=10914402957 width=24) (actual time=0.023..1872651.044 rows=10914402000 loops=1)
-> Seq Scan on test t1 (cost=0.00..18506.00 rows=1201000 width=12) (actual time=0.013..62.837 rows=1201000 loops=1)
-> Index Scan using b_idx on test t2 (cost=0.43..0.48 rows=1 width=12) (actual time=0.001..0.945 rows=9088 loops=1201000)
Index Cond: (b = t1.b)
Planning Time: 0.297 ms
Execution Time: 2062621.438 ms
(6 rows)
The fundamental join-size estimate, that is the selectivity of t1.b =
t2.b, is pretty much dead on here. And note the actual rows=9088 in
the inner indexscan. What we see here is that the actual average
number of t2 rows joining to a t1 row is 9088. Now the reason for the
estimate for the second query becomes clear: the planner doesn't know
exactly how many rows join to the specific row with t1.a = 1, but it
knows that the average number of joined rows should be 9088, so that's
its estimate. In your original query, that is reduced a little bit by
the not-very-selective "c = 10" condition, but the big picture is the
same. Basically, the row with "a = 1" has an atypical number of join
partners and that's why the join size estimate is wrong for this
particular query.
You might nonetheless complain that the join size estimate should
be the product of the two input-scan estimates, but that's not
how it works. (Typically those do match up, but with highly skewed
data like this the fact that they're derived differently can become
obvious.) The size of the b_idx result is based on considering the
selectivity of "t2.b = ?" where the comparison value is not known.
Because there are so many b values that are unique, we end up
estimating the average number of matching rows as 1 even though
it will be far higher for a few values.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | James Pang | 2024-09-10 07:33:18 | many backends hang on MultiXactOffsetSLRU |
Previous Message | Tom Lane | 2024-09-08 16:50:34 | Re: Estimate of the inner_rows |