Quick Links

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

From:	Alan Hodgson <ahodgson(at)lists(dot)simkin(dot)ca>
To:	Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date:	2021-12-06 17:57:53
Message-ID:	ac6b2c9214684d7645554b0673d4a7148c0a52e9.camel@lists.simkin.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, 2021-12-06 at 18:20 +0100, Francisco Olarte wrote:
>
> Can you post an explain analyze? To me it seems like the planner
> thinks shipment_import_id is randomly distributed and the table is
> well correlated with it's PK, so scanning it for the first id
> should
> be fast.

#explain analyze SELECT "shipment_import_records".* FROM
   shipment_import_records" WHERE
   shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
   "shipment_import_records"."id" ASC LIMIT 1;
   ------------------------------------------------------------------
---
   ------------------------------------------------------------------
---
   -------------------------------------
    Limit (cost=0.44..873.08 rows=1 width=243) (actual
   time=31689.725..31689.726 rows=1 loops=1)
      -> Index Scan using shipment_import_records_pkey on
   shipment_import_records (cost=0.44..5122405.71 rows=5870
width=243)
   (actual time=31689.723..31689.724 rows=1 loops=1)
            Filter: (shipment_import_id = 5090609)
            Rows Removed by Filter: 28710802
    Planning Time: 0.994 ms
    Execution Time: 31689.744 ms
   (6 rows)

The biggest one (but yes "earlier"):

# explain analyze SELECT "shipment_import_records".* FROM
"shipment_import_records" WHERE
"shipment_import_records"."shipment_import_id" = 1247888 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1;

QUERY PLAN



---------------------------------------------------------------------
---------------------------------------------------------------------
----------------------------
--------
Limit (cost=0.44..426.59 rows=1 width=243) (actual
time=8007.069..8007.070 rows=1 loops=1)
  -> Index Scan using shipment_import_records_pkey on
shipment_import_records (cost=0.44..5126628.40 rows=12030 width=243)
(actual time=8007.068..8007.068 rows=1 l
oops=1)
        Filter: (shipment_import_id = 1247888)
        Rows Removed by Filter: 10929193
Planning Time: 0.584 ms
Execution Time: 8007.086 ms
(6 rows)

And the smallest/latest, which actually uses the "right" index:

# explain analyze SELECT "shipment_import_records".* FROM
"shipment_import_records" WHERE
"shipment_import_records"."shipment_import_id" = 5116174 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1;

QUERY PLAN


---------------------------------------------------------------------
---------------------------------------------------------------------
----------------------------
------------------------
Limit (cost=145.44..145.44 rows=1 width=243) (actual
time=0.018..0.018 rows=1 loops=1)
  -> Sort (cost=145.44..145.64 rows=79 width=243) (actual
time=0.017..0.018 rows=1 loops=1)
        Sort Key: id
        Sort Method: quicksort Memory: 26kB
        -> Index Scan using
index_shipment_import_records_on_shipment_import_id on
shipment_import_records (cost=0.44..145.05 rows=79 width=243)
(actual time=0.013
..0.014 rows=1 loops=1)
              Index Cond: (shipment_import_id = 5116174)
Planning Time: 0.104 ms
Execution Time: 0.032 ms
(8 rows)

>
> But from the names of the field you may have correlation between
> shipment_import_id and id hidden somewhere ( like they are two
> serial
> growing together, you query for the latest shipment ids and it
> scans
> all the table ). An explain analyze should show that ( or three,
> one
> for that shipment import id, one for 1, one for a really big one )

This is definitely the case. And we are generally looking for newer
data for most operations.

Thanks for looking at it.

In response to

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT at 2021-12-06 17:20:44 from Francisco Olarte

Responses

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT at 2021-12-07 12:34:54 from Francisco Olarte

Browse pgsql-general by date

	From	Date	Subject
Next Message	Alan Hodgson	2021-12-06 18:02:44	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Previous Message	Alan Hodgson	2021-12-06 17:41:46	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT