Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

From: Alan Hodgson <ahodgson(at)lists(dot)simkin(dot)ca>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date: 2021-12-06 17:41:46
Message-ID: 024ebf6667151e797210f6f418b60283377e3bb5.camel@lists.simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 2021-12-06 at 10:18 -0700, Michael Lewis wrote:
> What is your default_statistics_target and how accurate is that
> estimate of 5668 rows? What is random_page_cost set to by the way?
>
>
>

default_statistics_target = 1000
random_page_cost = 2.0 (it's on AWS on a 9000 iops gp2 volume)

Postgresql 13.5 btw.

The estimate was reasonably accurate, there were 5069 actual rows
matching.

> More importantly, what is the better plan that you'd like the planner
> to use with your existing indexes?

Well, it takes a few ms to grab all 5000 rows by shipment_import_id
and then sort/limit them. It takes 30 seconds to do what it is doing
instead, and only when the table is mostly cached already, more like
4-5 minutes otherwise.


#explain analyze SELECT "shipment_import_records".* FROM
shipment_import_records" WHERE
shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1;
---------------------------------------------------------------------
---------------------------------------------------------------------
-------------------------------------
Limit (cost=0.44..873.08 rows=1 width=243) (actual
time=31689.725..31689.726 rows=1 loops=1)
-> Index Scan using shipment_import_records_pkey on
shipment_import_records (cost=0.44..5122405.71 rows=5870 width=243)
(actual time=31689.723..31689.724 rows=1 loops=1)
Filter: (shipment_import_id = 5090609)
Rows Removed by Filter: 28710802
Planning Time: 0.994 ms
Execution Time: 31689.744 ms
(6 rows)

Just with a kludge to force the better index:

# explain analyze SELECT * FROM (SELECT "shipment_import_records".*
FROM "shipment_import_records" WHERE
"shipment_import_records"."shipment_import_id" = 5090609 OFFSET 0) AS
x ORDER BY "id" ASC LIMIT 1;

QUERY PLAN
---------------------------------------------------------------------
---------------------------------------------------------------------
------------------
Limit (cost=10655.34..10655.34 rows=1 width=243) (actual
time=4.868..4.869 rows=1 loops=1)
-> Sort (cost=10655.34..10670.02 rows=5870 width=243) (actual
time=4.867..4.868 rows=1 loops=1)
Sort Key: shipment_import_records.id
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using
index_shipment_import_records_on_shipment_import_id on
shipment_import_records (cost=0.44..10567.29 rows=5870 width=243)
(actual time=0.037..3.560 rows=5069 loops=1)
Index Cond: (shipment_import_id = 5090609)
Planning Time: 0.135 ms
Execution Time: 4.885 ms
(8 rows)

>
> Certainly a composite index would be very helpful here. Using explain
> analyze and sharing the output would give more info to go on.
>

Yeah I am going to just do the composite index for now, but was
hoping for a more generic option.

Thanks for looking at it.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alan Hodgson 2021-12-06 17:57:53 Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Previous Message Francisco Olarte 2021-12-06 17:20:44 Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT