Quick Links

Re: MergeAppend could consider sorting cheapest child path

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Andy Fan <zhihuifan1213(at)163(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject:	Re: MergeAppend could consider sorting cheapest child path
Date:	2025-04-24 13:01:18
Message-ID:	ef50ac7b-d04b-462d-930a-114ae77ce7b2@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3/28/25 09:19, Alexander Pyhalov wrote:
> Andy Fan писал(а) 2024-10-17 03:33:
> I've updated patch. One more interesting case which we found - when
> fractional path is selected, it still can be more expensive than sorted
> cheapest total path (as we look only on indexes whith necessary
> pathkeys, not on indexes which allow efficiently fetch data).
> So far couldn't find artificial example, but we've seen inadequate index
> selection due to this issue - instead of using index suited for filters
> in where, index, suitable for sorting was selected as one having the
> cheapest fractional cost.
I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be compared
to the overall-optimal path + explicit Sort node. Let's label this
two-node composition as base_path. There are three cases exist:
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the
total-optimal case - cheapest_total_path and for a fractional case, we
may employ the get_cheapest_fractional_path routine to detect proper
base_path.

It may provide a more effective plan either in full, fractional and
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is
profitable to sort them instead of switching to plain Append.

In general, analysing the regression tests changed by this code, I see
that the cost model prefers a series of small sortings instead of a
single massive one. May be it will show some profit for execution time.

I am not afraid of any palpable planning overhead here because we just
do cheap cost estimation and comparison operations that don't need
additional memory allocations. The caller is responsible for building a
proper Sort node if this method is chosen as optimal.

In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

I have designed the code that way to reduce duplicated code in the
generate_orderedappend_paths routine. But the main point is that I
envision these new routines may be reused elsewhere.

This feature looks сlose to the one we discussed before [1]. So, let me
CC the people from the previous conversation to the discussion.

[1]
https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v0-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patch	text/x-patch	28.2 KB

In response to

Re: MergeAppend could consider sorting cheapest child path at 2025-03-28 08:19:30 from Alexander Pyhalov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Christoph Berg	2025-04-24 13:25:55	Re: vacuumdb --missing-stats-only and pg_upgrade from PG13
Previous Message	Bruce Momjian	2025-04-24 12:58:41	Re: pg_upgrade-breaking release