Quick Links

Inefficient plan on 10.4

From:	"Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Inefficient plan on 10.4
Date:	2018-07-05 13:17:51
Message-ID:	20180705131751.so2x3xxemmtr24nj@hjp.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I have this table:

which has about 150 million rows:

wdsah=> select count(*) from relation;
count
-----------
147810590
(1 row)

I'm trying to get to get the siblings of a node (including itself):

select r2.parent, r2.type, r2.child
from relation r1, relation r2
where r1.child=643541 and r2.parent=r1.parent
order by r2.type

This worked fine on 9.5, but on 10.4 it takes several seconds:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=12032221.68..25195781.12 rows=112822632 width=15) (actual time=4086.255..4086.257 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=12031221.66..12172249.95 rows=56411316 width=15) (actual time=4080.862..4080.862 rows=1 loops=3)
Sort Key: r2.type
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.57..1876228.04 rows=56411316 width=15) (actual time=3036.547..4080.826 rows=1 loops=3)
-> Parallel Seq Scan on relation r1 (cost=0.00..1856722.83 rows=1 width=4) (actual time=3036.525..4080.802 rows=0 loops=3)
Filter: (child = 643541)
Rows Removed by Filter: 49270196
-> Index Scan using relation_parent_idx on relation r2 (cost=0.57..17041.69 rows=246351 width=15) (actual time=0.053..0.055 rows=2 loops=1)
Index Cond: (parent = r1.parent)
Planning time: 0.418 ms
Execution time: 4090.442 ms

If I disable sequential scans, it works fine again:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=25246497.92..25584965.81 rows=135387158 width=15) (actual time=0.119..0.119 rows=2 loops=1)
Sort Key: r2.type
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.14..19513.79 rows=135387158 width=15) (actual time=0.101..0.104 rows=2 loops=1)
-> Index Scan using relation_child_idx on relation r1 (cost=0.57..8.59 rows=1 width=4) (actual time=0.079..0.080 rows=1 loops=1)
Index Cond: (child = 643541)
-> Index Scan using relation_parent_idx on relation r2 (cost=0.57..17041.69 rows=246351 width=15) (actual time=0.018..0.020 rows=2 loops=1)
Index Cond: (parent = r1.parent)
Planning time: 0.446 ms
Execution time: 0.165 ms

There are several points about these plans that I don't understand:

* Why does the nested loop expect 56E6 or even 135E6 rows? It expects 1
row for the outer table and then 246351 rows in the inner table for
each of them. 1 * 246351 == 246351. So it should expect 246351 rows.
(246351 itself is way too high, but the table has a very skewed
distribution, and I've already set the statistics target to the
maximum of 10000, so there's not much I can do about that)

* Why does the Parallel Seq Scan report actual rows=0? It did return 1
row (or is that the average per worker? That would be < 1, and
probably rounded down to 0)

PS: The obvious workaround is to remove "order by r2.type". I can easily
get the required partial order in the application. But I'd like to
understand what the optimizer is doing here.

Browse pgsql-general by date

	From	Date	Subject
Next Message	David G. Johnston	2018-07-05 13:18:11	Re: FK v.s unique indexes
Previous Message	Tom van Tilburg	2018-07-05 12:44:31	Re: Postgres sometimes stalling on 'percentile_cont'