Re: MergeAppend could consider sorting cheapest child path

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject: Re: MergeAppend could consider sorting cheapest child path
Date: 2025-04-25 09:16:29
Message-ID: 96e6b45ca97ac61657ca8c4d65e135f8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrei Lepikhov писал(а) 2025-04-24 16:01:
> On 3/28/25 09:19, Alexander Pyhalov wrote:
>> Andy Fan писал(а) 2024-10-17 03:33:
>> I've updated patch. One more interesting case which we found - when
>> fractional path is selected, it still can be more expensive than
>> sorted cheapest total path (as we look only on indexes whith necessary
>> pathkeys, not on indexes which allow efficiently fetch data).
>> So far couldn't find artificial example, but we've seen inadequate
>> index selection due to this issue - instead of using index suited for
>> filters in where, index, suitable for sorting was selected as one
>> having the cheapest fractional cost.
> I think it is necessary to generalise the approach a little.
>
> Each MergeAppend subpath candidate that fits pathkeys should be
> compared to the overall-optimal path + explicit Sort node. Let's label
> this two-node composition as base_path. There are three cases exist:
> startup-optimal, total-optimal and fractional-optimal.
> In the startup case, base_path should use cheapest_startup_path, the
> total-optimal case - cheapest_total_path and for a fractional case, we
> may employ the get_cheapest_fractional_path routine to detect proper
> base_path.
>
> It may provide a more effective plan either in full, fractional and
> partial scan cases:
> 1. The Sort node may be pushed down to subpaths under a parallel or
> async Append.
> 2. When a minor set of subpaths doesn't have a proper index, and it is
> profitable to sort them instead of switching to plain Append.
>
> In general, analysing the regression tests changed by this code, I see
> that the cost model prefers a series of small sortings instead of a
> single massive one. May be it will show some profit for execution time.
>
> I am not afraid of any palpable planning overhead here because we just
> do cheap cost estimation and comparison operations that don't need
> additional memory allocations. The caller is responsible for building a
> proper Sort node if this method is chosen as optimal.
>
> In the attachment, see the patch written according to the idea. There
> are I introduced two new routines:
> get_cheapest_path_for_pathkeys_ext
> get_cheapest_fractional_path_for_pathkeys_ext

Hi. I'm a bit confused that
get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
STARTUP_COST case looks only on sorting cheapest_startup_path.
Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.

Consider the partition with the following access paths:

1) cheapest_startup without required pathkeys:
startup_cost: 0.42
total_cost: 4004

2) some index_path with required pathkeys:
startup_cost: 6.6
total_cost: 2000

3) cheapest_total_path:
startup_cost: 0.42
total_cost: 3.48

Here, when selecting cheapest startup subpath we'll compare costs of
index path (2) and sorted cheapest_startup (1), but will ignore sorted
cheapest_total_path (3), despite the fact that it really can be the
cheapest startup path, providing required sorting order.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message George MacKerron 2025-04-25 10:20:55 Re: Making sslrootcert=system work on Windows psql
Previous Message Amit Kapila 2025-04-25 09:08:51 Re: long-standing data loss bug in initial sync of logical replication