single table - fighting a seq scan

From: Radoslav Nedyalkov <rnedyalkov(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: single table - fighting a seq scan
Date: 2020-07-14 17:29:03
Message-ID: CANhtRiZMKJEHHK-Y6HEkYKHNULWshbks37EVDj=wLYdhZ0_rGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Forum,
I'm scratching my head around the following case:

*te* is a 80M rows, 100GB table. It is a bare simple select over indexed
attribute of it.

EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
240900027,
240900028,
-- 200 entries ...

Gather (cost=1000.00..61517367.85 rows=3870 width=8)
Workers Planned: 2
-> Parallel Seq Scan on te (cost=0.00..61515980.85 rows=1612 width=8)
Filter: (current_pid = ANY
('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes

Reducing number of current_pids to 100 changes the plan and it does index
scan. (101 still does seq scan)

Index Scan using te_current_pid_idx on te (cost=0.57..731.26 rows=3832
width=8) (actual time=0.566..1.667 rows=600 loops=1)
Index Cond: (current_pid = ANY
('{240900026,240900027,...240900194}'::bigint[]))
Planning Time: 3.152 ms
Execution Time: 1.732 ms

Selecting 200 pids rewritten with CTE goes for index too.

EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
240900027,
240900028,
...
240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.58..1097.83 rows=3847 width=8) (actual
time=0.882..14.927 rows=1468 loops=1)
CTE cte
-> Function Scan on unnest (cost=0.00..1.00 rows=100 width=4)
(actual time=0.025..0.043 rows=205 loops=1)
-> CTE Scan on cte (cost=0.00..2.00 rows=100 width=4) (actual
time=0.027..0.083 rows=205 loops=1)
-> Index Scan using te_current_pid_idx on te (cost=0.57..10.57 rows=38
width=16) (actual time=0.011..0.071 rows=7 loops=205)
Index Cond: (current_pid = cte.unnest)
Planning Time: 2.022 ms
Execution Time: 15.044 ms

I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index
scan for a few more IN entries.
Did analyze. Bumped stats target for current_pid to 5000. Did not help.

I'm out of ideas. What is the right approach to solve this ?
Thank You!

Rado

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Millas 2020-07-14 18:43:51 Re: how to "explain" some ddl
Previous Message Tom Lane 2020-07-14 17:05:14 Re: how to "explain" some ddl