Re: Why don't we consider explicit Incremental Sort?

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why don't we consider explicit Incremental Sort?
Date: 2024-09-24 02:21:28
Message-ID: CAMbWs4-3Z-pcHCWq3gBVHp=43kqaF3Quhwe5cUg-x9O54Lv2CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 23, 2024 at 10:01 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> You don't need any skew. Consider this perfectly uniform dataset:
>
> create table t (a int, b int);
> insert into t select 100000 * random(), 100 * random()
> from generate_series(1,1000000) s(i);
> create index on t (a);
> vacuum analyze;
> checkpoint;
>
> explain analyze select * from t order by a, b;

I think if we want to compare the performance of incremental sort vs.
full sort on this dataset, we need to ensure that other variables are
kept constant, ie we need to ensure that both methods use the same
subpath, whether it's Index Scan or Seq Scan.

This is especially true in the scenario addressed by this patch, as
for a mergejoin, we only consider outerrel's cheapest_total_path when
we assume that an explicit sort atop is needed. That is to say, the
subpath has already been chosen when it comes to determine whether to
use incremental sort or full sort.

According to this theory, here is what I got on this same dataset:

-- incremental sort
explain analyze select * from t order by a, b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort (cost=1.02..68838.65 rows=1000000 width=8) (actual
time=1.292..1564.265 rows=1000000 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 27022 Sort Method: quicksort Average Memory:
25kB Peak Memory: 25kB
-> Index Scan using t_a_idx on t (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.408..1018.785 rows=1000000
loops=1)
Planning Time: 0.998 ms
Execution Time: 1602.861 ms
(7 rows)

-- full sort
set enable_incremental_sort to off;
set enable_seqscan to off;
explain analyze select * from t order by a, b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=150576.54..153076.54 rows=1000000 width=8) (actual
time=1760.090..1927.598 rows=1000000 loops=1)
Sort Key: a, b
Sort Method: external merge Disk: 17704kB
-> Index Scan using t_a_idx on t (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.531..1010.931 rows=1000000
loops=1)
Planning Time: 0.980 ms
Execution Time: 1970.287 ms
(6 rows)

So incremental sort outperforms full sort on this dataset.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-09-24 02:24:46 Re: installcheck-world concurrency issues
Previous Message Shayon Mukherjee 2024-09-24 01:44:17 Re: Proposal to Enable/Disable Index using ALTER INDEX