Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: Alan Hodgson <ahodgson(at)lists(dot)simkin(dot)ca>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date: 2021-12-06 18:22:15
Message-ID: CAFj8pRDsmF2PVKOQK47EhOfhSPrKuRoUSfb=cwJJSM+NubbUMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

po 6. 12. 2021 v 18:21 odesílatel Francisco Olarte <folarte(at)peoplecall(dot)com>
napsal:

> On Mon, 6 Dec 2021 at 18:03, Alan Hodgson <ahodgson(at)lists(dot)simkin(dot)ca>
> wrote:
> ...
> > The table has nearly 29 million records. 5069 of them match
> shipment_import_id = 5090609. There is an index on shipment_import_id,
> which the planner happily uses without the LIMIT specifically. Yet with it
> the query planner will always do something like:
> >
> > # explain SELECT "shipment_import_records".* FROM
> "shipment_import_records" WHERE
> "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
> "shipment_import_records"."id" ASC LIMIT 1;
> > QUERY PLAN
> >
> -----------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.44..873.35 rows=1 width=243)
> > -> Index Scan using shipment_import_records_pkey on
> shipment_import_records (cost=0.44..5122227.70 rows=5868 width=243)
> > Filter: (shipment_import_id = 5090609)
> > .. which takes minutes.
>
> Can you post an explain analyze? To me it seems like the planner
> thinks shipment_import_id is randomly distributed and the table is
> well correlated with it's PK, so scanning it for the first id should
> be fast.
>
> But from the names of the field you may have correlation between
> shipment_import_id and id hidden somewhere ( like they are two serial
> growing together, you query for the latest shipment ids and it scans
> all the table ). An explain analyze should show that ( or three, one
> for that shipment import id, one for 1, one for a really big one )
>
> > Just wondering if there's a knob I can turn to make these more likely to
> work without constantly implementing workarounds?
>
> You may try a composite index.
>

+1 These issues can be solved by composite indexes. The low limit clause
deforms costs and when the data are not really random, then index scan can
be too long.

> Francisco Olarte.
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Phil Endecott 2021-12-06 18:48:47 Advice on using materialized views
Previous Message Adrian Klaver 2021-12-06 18:07:57 Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT