Re: Why don't we consider explicit Incremental Sort?

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why don't we consider explicit Incremental Sort?
Date: 2024-09-23 03:03:37
Message-ID: CAMbWs48xVJp62Yp82T-R8_SChc2-WcU1FE0J6B3j2yHNOFVViA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Just looking at the commit message:
>
> > The rationale is based on the assumption that incremental sort is
> > always faster than full sort when there are presorted keys, a premise
> > that has been applied in various parts of the code. This assumption
> > does not always hold, particularly in cases with a large skew in the
> > number of rows within the presorted groups.
>
> My understanding is that the worst case for incremental sort is the
> same as sort, i.e. only 1 presorted group, which is the same effort to
> sort. Is there something I'm missing?

I was thinking that when there’s a large skew in the number of rows
per pre-sorted group, incremental sort might underperform full sort.
As mentioned in the comments for cost_incremental_sort, it has to
detect the sort groups, plus it needs to perform tuplesort_reset after
each group. However, I doubt these factors would have a substantial
impact on the performance of incremental sort. So maybe in the worst
skewed groups case, incremental sort may still perform similarly to
full sort.

Another less obvious reason is that in cases of skewed groups, we may
significantly underestimate the cost of incremental sort. This could
result in choosing a more expensive subpath under the sort. Such as
the example in [1], we end up with IndexScan->IncrementalSort rather
than SeqScan->FullSort, while the former plan is more expensive to
execute. However, this point does not affect this patch, because for
a mergejoin, we only consider outerrel's cheapest-total-cost when we
assume that an explicit sort atop is needed, i.e., we do not have a
chance to select which subpath to use in this case.

[1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-09-23 03:28:00 Re: ANALYZE ONLY
Previous Message Amit Kapila 2024-09-23 02:47:16 Re: Pgoutput not capturing the generated columns