Re: Why don't we consider explicit Incremental Sort?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why don't we consider explicit Incremental Sort?
Date: 2024-09-13 11:35:35
Message-ID: 43ee4d6a-3531-4bdd-bd41-bad7b2c67281@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/13/24 06:04, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I think we intentionally added incremental sort ... incrementally ;-)
>
> Haha, right.
>
>> I think one challenge with this case is that create_mergejoin_plan
>> creates these Sort plans explicitly - it's not "pathified" so it doesn't
>> go through the usual cost comparison etc. And it certainly is not the
>> case that incremental sort would win always, so we'd need to replicate
>> the cost comparison logic too.
>>
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
>
> Yeah, currently mergejoin path always assumes that full sort would be
> used on top of the outer path and inner path if they are not already
> ordered appropriately. So initial_cost_mergejoin directly charges the
> cost of full sort into outer/inner path's cost, without going through
> the usual cost comparison with incremental sort.
>
> It seems to me that some parts of our code assume that, for a given
> input path that is partially ordered, an incremental sort is always
> preferable to a full sort (see commit 4a29eabd1). Am I getting this
> correctly?

I don't think we're making such assumption. I don't know which of the
many places modified by 4a29eabd1 you have in mind, but IIRC we always
consider both full and incremental sort.

> If this is the case, then I think using the following
> outer path for the merge join:
>
> -> Incremental Sort
> Sort Key: t1.a, t1.b
> Presorted Key: t1.a
> -> Index Scan using t1_a_idx on t1
>
> ... would be an immediate improvement over the current path, which is:
>
> -> Sort
> Sort Key: t1.a, t1.b
> -> Index Scan using t1_a_idx on t1
>
>
> The shaky cost estimates for incremental sort that you mentioned are
> indeed a concern. Fortunately we have the enable_incremental_sort GUC
> already. As in may other parts of the code (such as in the function
> add_paths_to_grouping_rel), we can always disable incremental sort to
> fall back to a full sort if needed.
>

I think our goal should be to not rely on the enable_incremental_sort
GUC as a defense very much. It's a very blunt instrument, that often
forces users to pick whether they prefer to improve one query while
harming some other queries. I personally consider these enable_ GUC more
a development tool than something suitable for users.

>> I'm not saying it's not worth exploring, I'm just trying recall reasons
>> why it might not work. Also I don't think it's fundamentally impossible
>> to do mark/restore for incremental sort - I just haven't tried doing it
>> because it wasn't necessary. In the worst case we could simply add a
>> Materialize node on top, no?
>
> Yeah, perhaps we could support mark/restore for incremental sort
> someday. This would allow us to apply incremental sort to the inner
> path of a merge join too. But if we apply a Material node on top of
> the inner due to the lack of mark/restore support for incremental
> sort, then we will need to compare two paths:
>
> partially sorted path -> incremental sort -> material
>
> VS.
>
> partially sorted path -> full sort
>
> I think it's hard to tell which is cheaper without a cost comparison,
> which we do not have for now.
>

How is this different from the incremental sort costing we already have?

>
> To help with the discussion, I drafted a WIP patch as attached, which
> chooses an incremental sort over a full sort on the given outer path
> of a mergejoin whenever possible. There is one ensuing plan diff in
> the regression tests (not included in the patch). It seems that some
> query in the tests begins to use incremental sort for mergejoin.
>

I'm not against the patch in principle, but I haven't thought about the
costing and risk very much. If I have time I'll try to do some more
experiments to see how it behaves, but no promises.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Srirama Kucherlapati 2024-09-13 11:49:24 RE: AIX support
Previous Message Shubham Khanna 2024-09-13 11:34:06 Re: Pgoutput not capturing the generated columns