Re: Why don't we consider explicit Incremental Sort?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why don't we consider explicit Incremental Sort?
Date: 2024-09-09 10:22:50
Message-ID: 40d81bb0-a7c2-4ab8-a109-52a4c9df4839@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 9/9/24 11:39, Richard Guo wrote:
> While looking at label_sort_with_costsize() due to another issue, I
> noticed that we build explicit Sort nodes but not explicit Incremental
> Sort nodes. I wonder why this is the case. It seems to me that
> Incremental Sorts are preferable in some cases. For example:
>

I think we intentionally added incremental sort ... incrementally ;-)

I don't recall the reasoning exactly, but I think we realized the
incremental sort costing can be a bit shaky (and AFAIK we saw a couple
cases of that reported). So we added it to places where the reasoning
was it wouldn't be a problem and the incremental sort would be a clear
win, e.g. thanks to the "cheap startup" thing.

> create table t (a int, b int);
> create index on t (a);
>
> set enable_seqscan to off;
>
> explain (costs off)
> select * from t t1 join t t2 on t1.a = t2.a and t1.b = t2.b;
> QUERY PLAN
> -------------------------------------------------
> Merge Join
> Merge Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
> -> Sort
> Sort Key: t1.a, t1.b
> -> Index Scan using t_a_idx on t t1
> -> Sort
> Sort Key: t2.a, t2.b
> -> Index Scan using t_a_idx on t t2
> (8 rows)
>
> For the outer path of a mergejoin, I think we can leverage Incremental
> Sort to save effort. For the inner path, though, we cannot do this
> because Incremental Sort does not support mark/restore.
>
> It could be argued that what if a mergejoin with an Incremental Sort
> as the outer path is selected as the inner path of another mergejoin?
> Well, I don't think this is a problem, because mergejoin itself does
> not support mark/restore either, and we will add a Material node on
> top of it anyway in this case (see final_cost_mergejoin).
>
> label_sort_with_costsize is also called in create_append_plan,
> create_merge_append_plan and create_unique_plan. In these cases, we
> may also consider using Incremental Sort. But I haven't looked into
> these cases.
>
> Any thoughts?
>

I think one challenge with this case is that create_mergejoin_plan
creates these Sort plans explicitly - it's not "pathified" so it doesn't
go through the usual cost comparison etc. And it certainly is not the
case that incremental sort would win always, so we'd need to replicate
the cost comparison logic too.

I have not thought about this particular case too much, but how likely
is it that mergejoin will win for this plan in practice? If we consider
a query that only needs a fraction of rows (say, thanks to LIMIT),
aren't we likely to pick a nested loop (which can do incremental sort
for the outer plan)? For joins that need to run to completion it might
be a win, but there's also the higher risk of a poor costing.

I'm not saying it's not worth exploring, I'm just trying recall reasons
why it might not work. Also I don't think it's fundamentally impossible
to do mark/restore for incremental sort - I just haven't tried doing it
because it wasn't necessary. In the worst case we could simply add a
Materialize node on top, no?

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-09-09 10:36:54 Re: POC, WIP: OR-clause support for indexes
Previous Message Ashutosh Bapat 2024-09-09 10:13:58 Re: Test to dump and restore objects left behind by regression