Re: Possible optimisation: push down SORT and LIMIT nodes

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Christopher Wilson <chris(dot)wilson(at)cantabcapital(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Steven Winfield <Steven(dot)Winfield(at)cantabcapital(dot)com>
Subject: Re: Possible optimisation: push down SORT and LIMIT nodes
Date: 2018-05-30 19:02:31
Message-ID: 20180530190231.GH5164@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 30, 2018 at 03:46:40PM +0000, Christopher Wilson wrote:
> We have a query which is rather slow (about 10 seconds), and it looks like this:
>
> The inventory table has the quantity of each asset in the inventory on each
> date (complete SQL to create and populate the tables with dummy data is
> below). The query plan looks like this (the non-parallel version is similar):

Hi,

Thanks for including the test case.

> Limit (cost=217591.77..217603.60 rows=100 width=32) (actual time=9122.235..9122.535 rows=100 loops=1)
...
> -> Sort (cost=216591.73..220628.83 rows=1614839 width=32) (actual time=8879.909..8880.030 rows=727 loops=4)
> Sort Key: inventory.date, asset.name
> Sort Method: external merge Disk: 50904kB
> Buffers: shared hit=27365, temp read=25943 written=25947

Yep, the sort is expensive and largely wasted..

> I'm imagining something like a sort-limit-finish node, which sorts its input
> and then returns at least the limit number of rows, but keeps returning rows
> until it exhausts the last sort prefix that it read.
[...]
> Does this sound correct, reasonable and potentially interesting to Postgres
> developers?

I think your analysis may be (?) unecessarily specific to your specific problem
query.

For diagnostic purposes, I was able to to vastly improve the query runtime with
a CTE (WITH):

|postgres=# explain(analyze,buffers) WITH x AS (SELECT inventory.date, asset.name, inventory.quantity FROM temp.inventory LEFT JOIN temp.asset ON asset.id=id_asset LIMIT 99) SELECT * FROM x ORDER BY date, name;
| Sort (cost=1090.60..1090.85 rows=99 width=40) (actual time=3.764..3.988 rows=99 loops=1)
| Sort Key: x.date, x.name
| Sort Method: quicksort Memory: 32kB
| Buffers: shared hit=298
| CTE x
| -> Limit (cost=0.28..889.32 rows=99 width=31) (actual time=0.063..2.385 rows=99 loops=1)
| Buffers: shared hit=298
| -> Nested Loop Left Join (cost=0.28..44955006.99 rows=5006001 width=31) (actual time=0.058..1.940 rows=99 loops=1)
| Buffers: shared hit=298
| -> Seq Scan on inventory (cost=0.00..5033061.00 rows=5006001 width=12) (actual time=0.020..0.275 rows=99 loops=1)
| Buffers: shared hit=1
| -> Index Scan using asset_pkey on asset (cost=0.28..7.98 rows=1 width=27) (actual time=0.008..0.008 rows=1 loops=99)
| Index Cond: (id = inventory.id_asset)
| Buffers: shared hit=297
| -> CTE Scan on x (cost=0.00..198.00 rows=99 width=40) (actual time=0.073..2.989 rows=99 loops=1)
| Buffers: shared hit=298
| Planning time: 0.327 ms
| Execution time: 4.260 ms

It's not clear to me if there's some reason why the planner couldn't know to
use a similar plan (sort-limit-... rather than limit-sort-...)

Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-05-30 19:09:44 Re: Possible optimisation: push down SORT and LIMIT nodes
Previous Message Christopher Wilson 2018-05-30 15:46:40 Possible optimisation: push down SORT and LIMIT nodes