From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Why don't we consider explicit Incremental Sort? |
Date: | 2024-09-13 11:35:35 |
Message-ID: | 43ee4d6a-3531-4bdd-bd41-bad7b2c67281@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/13/24 06:04, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I think we intentionally added incremental sort ... incrementally ;-)
>
> Haha, right.
>
>> I think one challenge with this case is that create_mergejoin_plan
>> creates these Sort plans explicitly - it's not "pathified" so it doesn't
>> go through the usual cost comparison etc. And it certainly is not the
>> case that incremental sort would win always, so we'd need to replicate
>> the cost comparison logic too.
>>
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
>
> Yeah, currently mergejoin path always assumes that full sort would be
> used on top of the outer path and inner path if they are not already
> ordered appropriately. So initial_cost_mergejoin directly charges the
> cost of full sort into outer/inner path's cost, without going through
> the usual cost comparison with incremental sort.
>
> It seems to me that some parts of our code assume that, for a given
> input path that is partially ordered, an incremental sort is always
> preferable to a full sort (see commit 4a29eabd1). Am I getting this
> correctly?
I don't think we're making such assumption. I don't know which of the
many places modified by 4a29eabd1 you have in mind, but IIRC we always
consider both full and incremental sort.
> If this is the case, then I think using the following
> outer path for the merge join:
>
> -> Incremental Sort
> Sort Key: t1.a, t1.b
> Presorted Key: t1.a
> -> Index Scan using t1_a_idx on t1
>
> ... would be an immediate improvement over the current path, which is:
>
> -> Sort
> Sort Key: t1.a, t1.b
> -> Index Scan using t1_a_idx on t1
>
>
> The shaky cost estimates for incremental sort that you mentioned are
> indeed a concern. Fortunately we have the enable_incremental_sort GUC
> already. As in may other parts of the code (such as in the function
> add_paths_to_grouping_rel), we can always disable incremental sort to
> fall back to a full sort if needed.
>
I think our goal should be to not rely on the enable_incremental_sort
GUC as a defense very much. It's a very blunt instrument, that often
forces users to pick whether they prefer to improve one query while
harming some other queries. I personally consider these enable_ GUC more
a development tool than something suitable for users.
>> I'm not saying it's not worth exploring, I'm just trying recall reasons
>> why it might not work. Also I don't think it's fundamentally impossible
>> to do mark/restore for incremental sort - I just haven't tried doing it
>> because it wasn't necessary. In the worst case we could simply add a
>> Materialize node on top, no?
>
> Yeah, perhaps we could support mark/restore for incremental sort
> someday. This would allow us to apply incremental sort to the inner
> path of a merge join too. But if we apply a Material node on top of
> the inner due to the lack of mark/restore support for incremental
> sort, then we will need to compare two paths:
>
> partially sorted path -> incremental sort -> material
>
> VS.
>
> partially sorted path -> full sort
>
> I think it's hard to tell which is cheaper without a cost comparison,
> which we do not have for now.
>
How is this different from the incremental sort costing we already have?
>
> To help with the discussion, I drafted a WIP patch as attached, which
> chooses an incremental sort over a full sort on the given outer path
> of a mergejoin whenever possible. There is one ensuing plan diff in
> the regression tests (not included in the patch). It seems that some
> query in the tests begins to use incremental sort for mergejoin.
>
I'm not against the patch in principle, but I haven't thought about the
costing and risk very much. If I have time I'll try to do some more
experiments to see how it behaves, but no promises.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Srirama Kucherlapati | 2024-09-13 11:49:24 | RE: AIX support |
Previous Message | Shubham Khanna | 2024-09-13 11:34:06 | Re: Pgoutput not capturing the generated columns |