Re: Why don't we consider explicit Incremental Sort?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why don't we consider explicit Incremental Sort?
Date: 2024-09-13 11:51:55
Message-ID: f8f86cb5-cb60-4722-b3c7-99c7da62ace2@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/13/24 13:18, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
>
> I think one situation where mergejoin tends to outperform hashjoin and
> nestloop is when ORDER BY clauses are present. For example, for the
> query below, mergejoin runs much faster than hashjoin and nestloop,
> and mergejoin with incremental sort is even faster than mergejoin with
> full sort.
>

Sure, an incremental sort can improve things if things go well, no doubt
about that. But how significant can the improvement be, especially if we
need to sort everything? In your example it's ~15%, which is nice, and
maybe the benefits can be much larger if the incremental sort can do
everything in memory, without flushing to disk.

But what about the risks? What will happen if the groups are not this
uniformly and independently sized, and stuff like that? Maybe it'll be
costed well enough, I haven't tried.

I don't recall the exact reasoning for why we didn't add incremental
sort in various places, I'd have to dig into the old threads, or
something. But I believe thinking about these risks was part of it -
trying to handle places where the potential benefits are much larger
than the risks.

As I wrote earlier, it's not my intent to discourage you from working on
this. Please do, I'm sure it can be improved.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-09-13 11:56:52 Re: Pgoutput not capturing the generated columns
Previous Message Seino Yuki 2024-09-13 11:49:36 Add “FOR UPDATE NOWAIT” lock details to the log.