The hidden cost of limit-offset

From: 孙冰 <subi(dot)the(dot)dream(dot)walker(at)gmail(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: The hidden cost of limit-offset
Date: 2020-12-06 07:14:07
Message-ID: CA+czfDWpkNTkZ+8idweWV6oGL1RuVeDgZr1dE70ZObg_a4hG7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The skipped rows by an OFFSET clause have to be computed nevertheless. I am
wondering if there could be any chance to improve, since the computation is
on the *entire* rows rather than on the *criterial* columns.

Consider the following example.

Create a sample table and insert some random data.

create table thing (
id int generated always as identity,
tag float default random()
);
insert into thing select from generate_series(1,100);

Let's find the last 10 rows ordered by tag:

explain (analyze, verbose)
select id, pg_sleep(0.1) from thing
order by tag offset 90;

The execution plan is:

Limit (cost=5.77..5.92 rows=10 width=16) (actual time=9210.751..10121.411
rows=10 loops=1)
Output: id, (pg_sleep('0.1'::double precision)), tag
-> Result (cost=4.42..5.92 rows=100 width=16) (actual
time=101.251..10121.344 rows=100 loops=1)
Output: id, pg_sleep('0.1'::double precision), tag
-> Sort (cost=4.42..4.67 rows=100 width=12) (actual
time=0.064..0.200 rows=100 loops=1)
Output: id, tag
Sort Key: thing.tag
Sort Method: quicksort Memory: 29kB
-> Seq Scan on public.thing (cost=0.00..1.10 rows=100
width=12) (actual time=0.012..0.019 rows=100 loops=1)
Output: id, tag
Planning Time: 0.176 ms
Execution Time: 10121.450 ms

Obviously, all the 100 rows are computed before offsetting & limiting. But
something really catches my eye:

1. the 100 rows oftagare necessary becausetagis the sort key;
2. the 100 rows ofidare not all necessary but it is fine sinceidand tagare
fetched from the same relation;
3. the 100 rows of pg_sleep(0.1) cause the major performance hit
unnecessarily; pg_sleep(0.1) depends on neithertagnorid.

Could we improve the situation? Maybe the Limit node could be done in
advance before the Result node for id and pg_sleep(0.1)? The execution plan
could be something like:

Result (cost=5.77..5.92 rows=10 width=16)
Output: id, pg_sleep('0.1'::double precision), tag
-> Limit (cost=4.42..5.92 rows=10 width=12)
Output: id, tag
-> Sort (cost=4.42..4.67 rows=100 width=12)
Output: id, tag
Sort Key: thing.tag
-> Seq Scan on public.thing (cost=0.00..1.10 rows=100
width=12)
Output: id, tag

Generally, if the Limit process could be executed immediately after the
criterial attributes are computed rather than the full output is evaluated,
pagination by limit-offset could be considerably more performant for broad
use cases. That is, pagination is done by several keys which is usually
static, while a few dynamic but expensive attributes can be computed quite
effectively since the result set is typically very small, by its
(nested-)loop nature.

I don't understand the postgresql internal, but I suspect such a change may
introduce significant work on the planner and executor. From my point view,
skipping everything (or expensive ones) except the criteria in the target
list would greatly improve the usability of OFFSET, and it is definitely
worth the effort.

Best regards.

SUN Bing

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2020-12-06 16:05:41 Re: The hidden cost of limit-offset
Previous Message Laurenz Albe 2020-12-06 05:15:43 Re: Set COLLATE on a session level