From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Christopher Wilson <chris(dot)wilson(at)cantabcapital(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Steven Winfield <Steven(dot)Winfield(at)cantabcapital(dot)com> |
Subject: | Re: Possible optimisation: push down SORT and LIMIT nodes |
Date: | 2018-05-30 19:02:31 |
Message-ID: | 20180530190231.GH5164@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, May 30, 2018 at 03:46:40PM +0000, Christopher Wilson wrote:
> We have a query which is rather slow (about 10 seconds), and it looks like this:
>
> The inventory table has the quantity of each asset in the inventory on each
> date (complete SQL to create and populate the tables with dummy data is
> below). The query plan looks like this (the non-parallel version is similar):
Hi,
Thanks for including the test case.
> Limit (cost=217591.77..217603.60 rows=100 width=32) (actual time=9122.235..9122.535 rows=100 loops=1)
...
> -> Sort (cost=216591.73..220628.83 rows=1614839 width=32) (actual time=8879.909..8880.030 rows=727 loops=4)
> Sort Key: inventory.date, asset.name
> Sort Method: external merge Disk: 50904kB
> Buffers: shared hit=27365, temp read=25943 written=25947
Yep, the sort is expensive and largely wasted..
> I'm imagining something like a sort-limit-finish node, which sorts its input
> and then returns at least the limit number of rows, but keeps returning rows
> until it exhausts the last sort prefix that it read.
[...]
> Does this sound correct, reasonable and potentially interesting to Postgres
> developers?
I think your analysis may be (?) unecessarily specific to your specific problem
query.
For diagnostic purposes, I was able to to vastly improve the query runtime with
a CTE (WITH):
|postgres=# explain(analyze,buffers) WITH x AS (SELECT inventory.date, asset.name, inventory.quantity FROM temp.inventory LEFT JOIN temp.asset ON asset.id=id_asset LIMIT 99) SELECT * FROM x ORDER BY date, name;
| Sort (cost=1090.60..1090.85 rows=99 width=40) (actual time=3.764..3.988 rows=99 loops=1)
| Sort Key: x.date, x.name
| Sort Method: quicksort Memory: 32kB
| Buffers: shared hit=298
| CTE x
| -> Limit (cost=0.28..889.32 rows=99 width=31) (actual time=0.063..2.385 rows=99 loops=1)
| Buffers: shared hit=298
| -> Nested Loop Left Join (cost=0.28..44955006.99 rows=5006001 width=31) (actual time=0.058..1.940 rows=99 loops=1)
| Buffers: shared hit=298
| -> Seq Scan on inventory (cost=0.00..5033061.00 rows=5006001 width=12) (actual time=0.020..0.275 rows=99 loops=1)
| Buffers: shared hit=1
| -> Index Scan using asset_pkey on asset (cost=0.28..7.98 rows=1 width=27) (actual time=0.008..0.008 rows=1 loops=99)
| Index Cond: (id = inventory.id_asset)
| Buffers: shared hit=297
| -> CTE Scan on x (cost=0.00..198.00 rows=99 width=40) (actual time=0.073..2.989 rows=99 loops=1)
| Buffers: shared hit=298
| Planning time: 0.327 ms
| Execution time: 4.260 ms
It's not clear to me if there's some reason why the planner couldn't know to
use a similar plan (sort-limit-... rather than limit-sort-...)
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2018-05-30 19:09:44 | Re: Possible optimisation: push down SORT and LIMIT nodes |
Previous Message | Christopher Wilson | 2018-05-30 15:46:40 | Possible optimisation: push down SORT and LIMIT nodes |