From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Andy Fan <zhihuifan1213(at)163(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Nikita Malakhov <HukuToc(at)gmail(dot)com> |
Subject: | Re: MergeAppend could consider sorting cheapest child path |
Date: | 2025-04-24 13:01:18 |
Message-ID: | ef50ac7b-d04b-462d-930a-114ae77ce7b2@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/28/25 09:19, Alexander Pyhalov wrote:
> Andy Fan писал(а) 2024-10-17 03:33:
> I've updated patch. One more interesting case which we found - when
> fractional path is selected, it still can be more expensive than sorted
> cheapest total path (as we look only on indexes whith necessary
> pathkeys, not on indexes which allow efficiently fetch data).
> So far couldn't find artificial example, but we've seen inadequate index
> selection due to this issue - instead of using index suited for filters
> in where, index, suitable for sorting was selected as one having the
> cheapest fractional cost.
I think it is necessary to generalise the approach a little.
Each MergeAppend subpath candidate that fits pathkeys should be compared
to the overall-optimal path + explicit Sort node. Let's label this
two-node composition as base_path. There are three cases exist:
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the
total-optimal case - cheapest_total_path and for a fractional case, we
may employ the get_cheapest_fractional_path routine to detect proper
base_path.
It may provide a more effective plan either in full, fractional and
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is
profitable to sort them instead of switching to plain Append.
In general, analysing the regression tests changed by this code, I see
that the cost model prefers a series of small sortings instead of a
single massive one. May be it will show some profit for execution time.
I am not afraid of any palpable planning overhead here because we just
do cheap cost estimation and comparison operations that don't need
additional memory allocations. The caller is responsible for building a
proper Sort node if this method is chosen as optimal.
In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext
I have designed the code that way to reduce duplicated code in the
generate_orderedappend_paths routine. But the main point is that I
envision these new routines may be reused elsewhere.
This feature looks сlose to the one we discussed before [1]. So, let me
CC the people from the previous conversation to the discussion.
--
regards, Andrei Lepikhov
Attachment | Content-Type | Size |
---|---|---|
v0-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patch | text/x-patch | 28.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2025-04-24 13:25:55 | Re: vacuumdb --missing-stats-only and pg_upgrade from PG13 |
Previous Message | Bruce Momjian | 2025-04-24 12:58:41 | Re: pg_upgrade-breaking release |