Odd (slow) plan choice with min/max

From: Paul McGarry <paul(at)paulmcgarry(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Odd (slow) plan choice with min/max
Date: 2021-03-23 04:00:38
Message-ID: CAPrE0SYrq9VvWiKqNMKOhkRu0c-zTGsSOnSmL3TrHZNt6KOLUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I have a query where Postgresql (11.9 at the moment) is making an odd plan
choice, choosing to use index scans which require filtering out millions of
rows, rather than "just" doing an aggregate over the rows the where clause
targets which is much faster.
AFAICT it isn't a statistics problem, at least increasing the stats target
and analyzing the table doesn't seem to fix the problem.

The query looks like:

======
explain analyze select min(risk_id),max(risk_id) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=217.80..217.81 rows=1 width=16) (actual
time=99722.685..99722.687 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..108.90 rows=1 width=8) (actual
time=38454.537..38454.538 rows=1 loops=1)
-> Index Scan using risk_risk_id_key on risk
(cost=0.57..9280362.29 rows=85668 width=8) (actual
time=38454.535..38454.536 rows=1 loops=1)
Index Cond: (risk_id IS NOT NULL)
Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Rows Removed by Filter: 161048697
InitPlan 2 (returns $1)
-> Limit (cost=0.57..108.90 rows=1 width=8) (actual
time=61268.140..61268.140 rows=1 loops=1)
-> Index Scan Backward using risk_risk_id_key on risk risk_1
(cost=0.57..9280362.29 rows=85668 width=8) (actual
time=61268.138..61268.139 rows=1 loops=1)
Index Cond: (risk_id IS NOT NULL)
Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Rows Removed by Filter: 41746396
Planning Time: 0.173 ms
Execution Time: 99722.716 ms
(15 rows)
======

If I add a count(*) so it has to consider all rows in the range for that
part of the query and doesn't consider using the other index for a min/max
"shortcut" then the query is fast.
======
explain analyze select min(risk_id),max(risk_id), count(*) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4376.67..4376.68 rows=1 width=24) (actual
time=30.011..30.012 rows=1 loops=1)
-> Index Scan using risk_time_idx on risk (cost=0.57..3734.17
rows=85667 width=8) (actual time=0.018..22.441 rows=90973 loops=1)
Index Cond: (("time" >= '2020-01-20 15:00:07+00'::timestamp with
time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
Planning Time: 0.091 ms
Execution Time: 30.045 ms
(5 rows)
======

My count() hack works around my immediate problem but I'm trying to get my
head round why Postgres chooses the plan it does without it, in case there
is some general problem with my configuration that may negatively effect
other areas, or there's something else I am missing.

Any ideas?

Paul McGarry

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2021-03-23 05:13:03 Re: Odd (slow) plan choice with min/max
Previous Message Hannu Krosing 2021-03-22 21:39:02 Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)