Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

From: Alan Hodgson <ahodgson(at)lists(dot)simkin(dot)ca>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date: 2021-12-06 17:03:02
Message-ID: bc47e9d0c490942eebb01dc1129097012005e1d2.camel@lists.simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I keep running into problems like these:

Devs are using an ORM. It really likes to produce queries like:

SELECT "shipment_import_records".* FROM "shipment_import_records"
WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER
BY "shipment_import_records"."id" ASC LIMIT 1;

I don't know why they do this. Usually it's more like 50 for
pagination which make more sense. But for whatever reason this keeps
coming up.

The table has nearly 29 million records. 5069 of them match
shipment_import_id = 5090609. There is an index
on shipment_import_id, which the planner happily uses without the
LIMIT specifically. Yet with it the query planner will always do
something like:

# explain SELECT "shipment_import_records".* FROM
"shipment_import_records" WHERE
"shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1;         
                                                        QUERY PLAN
                                                          
---------------------------------------------------------------------
--------------------------------------------------------
Limit  (cost=0.44..873.35 rows=1 width=243)
  ->  Index Scan using shipment_import_records_pkey on
shipment_import_records  (cost=0.44..5122227.70 rows=5868 width=243)
        Filter: (shipment_import_id = 5090609)

.. which takes minutes.

I know I can work around this. Generally I would just drop the index
on shipment_import_id and create one on shipment_import_id,id. Or if
I can get the devs to wrap their query in an inner select with a fake
offset to fool the query planner that works too. But both seem hacky.

Just wondering if there's a knob I can turn to make these more likely
to work without constantly implementing workarounds?

Thanks for any help.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2021-12-06 17:18:11 Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Previous Message Avi Weinberg 2021-12-06 14:50:35 Are Foreign Key Disabled During Logical Replication Initial Sync?