A cost issue in ORDER BY + LIMIT

From: Paul Guo <paulguo(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: A cost issue in ORDER BY + LIMIT
Date: 2022-08-06 15:38:17
Message-ID: CABQrizdnkP4zmd5gQoqCqN9s-VCVqUWyFja6M072QPLFnA5T0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Postgres seems to always optimize ORDER BY + LIMIT as top-k sort.
Recently I happened to notice
that in this scenario the output tuple number of the sort node is not
the same as the LIMIT tuple number.

See below,

postgres=# explain analyze verbose select * from t1 order by a limit 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------
------------------------------
Limit (cost=69446.17..69446.20 rows=10 width=4) (actual
time=282.896..282.923 rows=10 loops=1)
Output: a
-> Sort (cost=69446.17..71925.83 rows=991862 width=4) (actual
time=282.894..282.896 rows=10 l
oops=1)
Output: a
Sort Key: t1.a
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on public.t1 (cost=0.00..44649.62 rows=991862
width=4) (actual time=0.026..
195.438 rows=1001000 loops=1)
Output: a
Planning Time: 0.553 ms
Execution Time: 282.961 ms
(10 rows)

We can see from the output that the LIMIT cost is wrong also due to
this since it assumes the input_rows
from the sort node is 991862 (per gdb debugging).

This could be easily fixed by below patch,

diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index fb28e6411a..800cf0b256 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2429,7 +2429,11 @@ cost_sort(Path *path, PlannerInfo *root,

startup_cost += input_cost;

- path->rows = tuples;
+ if (limit_tuples > 0 && limit_tuples < tuples)
+ path->rows = limit_tuples;
+ else
+ path->rows = tuples;
+
path->startup_cost = startup_cost;
path->total_cost = startup_cost + run_cost;
}

Withe the patch the explain output looks like this.

postgres=# explain analyze verbose select * from t1 order by a limit 10;
QUERY PLAN

--------------------------------------------------------------------------------------------------
------------------------------
Limit (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.204..256.207 rows=10 loops=1)
Output: a
-> Sort (cost=69446.17..71925.83 rows=10 width=4) (actual
time=256.202..256.203 rows=10 loops
=1)
Output: a
Sort Key: t1.a
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on public.t1 (cost=0.00..44649.62 rows=991862
width=4) (actual time=1.014..
169.509 rows=1001000 loops=1)
Output: a
Planning Time: 0.076 ms
Execution Time: 256.232 ms
(10 rows)

Regards,
Paul

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2022-08-06 15:48:55 Re: A cost issue in ORDER BY + LIMIT
Previous Message Andres Freund 2022-08-06 15:32:49 Re: `make check` doesn't pass on MacOS Catalina