Re: Performance regression when adding LIMIT 1 to a query

From: Costa Alexoglou <costa(at)dbtune(dot)com>
To: Artur Zakirov <zaartur(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Performance regression when adding LIMIT 1 to a query
Date: 2025-04-07 15:34:47
Message-ID: CAJ+5Ff6-gwJ+xYgM9fefx+T6T5pSfbo2LKbiui3s3ZXUmjatPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 7, 2025 at 4:06 PM Artur Zakirov <zaartur(at)gmail(dot)com> wrote:

> On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou <costa(at)dbtune(dot)com> wrote:
> > ...
> > with a plan:
> > ```
> > Gather Merge (cost=115584.47..118515.35 rows=25120 width=824) (actual
> time=46.004..74.267 rows=29653 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > -> Sort (cost=114584.45..114615.85 rows=12560 width=824) (actual
> time=41.200..47.322 rows=9884 loops=3)
> > Sort Key: id
> > Sort Method: external merge Disk: 16360kB
> > Worker 0: Sort Method: external merge Disk: 15552kB
> > Worker 1: Sort Method: external merge Disk: 14536kB
> > -> Parallel Bitmap Heap Scan on databases_metrics
> (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295
> rows=9884 loops=3)
> > Recheck Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > Filter: ((metrics -> 'perf_average_query_runtime'::text)
> IS NOT NULL)
> > Rows Removed by Filter: 68
> > Heap Blocks: exact=4272
> > -> Bitmap Index Scan on
> idx_databases_metrics_instance_date_custom_created_debugging
> (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856
> loops=1)"},
> > Index Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > ...
> > With a plan:
> > ```
> > Limit (cost=0.43..229.66 rows=1 width=824) (actual
> time=7538.004..7538.005 rows=1 loops=1)
> > -> Index Scan using databases_metrics_pkey on databases_metrics
> (cost=0.43..6909156.38 rows=30142 width=824) (actual
> time=7538.002..7538.003 rows=1 loops=1)
> > Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with
> time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time
> zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND
> (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
> > Rows Removed by Filter: 10244795
> > Planning Time: 0.128 ms
> > Execution Time: 7538.032 ms
> > ```
>
> On your second query Postgres uses the index "databases_metrics_pkey".
> I assume that it is built using the "id" column. It could be very fast
> with the statement "ORDER BY ... LIMIT", but due to the additional
> filter Postgres firstly has to remove 10mln rows, which doesn't
> satisfy the filter, only to reach one single row.
>
> On the first query Postgres has to read and sort only 29k rows using
> the index "idx_databases_metrics_instance_date_custom_created_debugging",
> which is better suited for the used filter if it includes the columns
> used in the filter.
>
> I'm not sure why Postgres chooses the index "databases_metrics_pkey".
> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
> on the table?
>
> --
> Kind regards,
> Artur
>

> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table

Yes, I just tried this, nothing changed.

I found this post:
https://bohanzhang.me/assets/blogs/order_by_limit/order_by_limit.html

And when applied the suggestion to add `+0` to the "order by" it worked
faster:
```
SELECT "databases_metrics"."metrics"
FROM
"databases_metrics"
WHERE
(
"databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
AND (
metrics -> 'perf_average_query_runtime' IS NOT NULL
)
)
ORDER BY
"databases_metrics"."id"+0 ASC LIMIT 1;
```

the plan:
```
Limit (cost=97195.71..97195.71 rows=1 width=821) (actual
time=38.084..38.086 rows=1 loops=1)
-> Sort (cost=97195.71..97261.91 rows=26482 width=821) (actual
time=38.083..38.084 rows=1 loops=1)
Sort Key: ((id + 0))
Sort Method: top-N heapsort Memory: 27kB
-> Bitmap Heap Scan on databases_metrics (cost=867.87..97063.30
rows=26482 width=821) (actual time=4.686..30.036 rows=29653 loops=1)
Recheck Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
Filter: ((metrics -> 'perf_average_query_runtime'::text) IS
NOT NULL)
Rows Removed by Filter: 203
Heap Blocks: exact=13492
-> Bitmap Index Scan on
idx_databases_metrics_instance_date_custom_created_debugging
(cost=0.00..861.25 rows=26615 width=0) (actual time=2.695..2.696
rows=29856 loops=1)
Index Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
Planning Time: 0.129 ms
Execution Time: 38.121 ms
```

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Simon Windsor 2025-04-07 19:25:28 Re: Slow timestamp query after upgrading from Pg13 to Pg16
Previous Message Laurenz Albe 2025-04-07 14:51:40 Re: Slow timestamp query after upgrading from Pg13 to Pg16