Re: MergeAppend could consider sorting cheapest child path

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject: Re: MergeAppend could consider sorting cheapest child path
Date: 2025-04-25 15:13:26
Message-ID: 1e9be983-467c-41bf-91d7-9565bff5825f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/25/25 11:16, Alexander Pyhalov wrote:
> Andrei Lepikhov писал(а) 2025-04-24 16:01:
>> On 3/28/25 09:19, Alexander Pyhalov wrote:
>> In the attachment, see the patch written according to the idea. There
>> are I introduced two new routines:
>> get_cheapest_path_for_pathkeys_ext
>> get_cheapest_fractional_path_for_pathkeys_ext
>
> Hi. I'm a bit confused that
Thanks for the participation!

> get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
> cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
> STARTUP_COST case looks only on sorting cheapest_startup_path.
At first, at the moment, I don't understand why we calculate the
cheapest_startup path at all. In my opinion, after commit 6b94e7a [1,
2], the min-fractional path totally covers the case. I began this
discussion in [3] - maybe we need to scrutinise that issue beforehand.

Looking into the min-fractional-path + Sort, we propose a path for the
case when extracting minor part of tuples with sorting later is cheaper
than doing a massive job of non-selective index scan. You also may
imagine the case of a JOIN as a subpath: non-sorted, highly selective
parameterised NestLoop may be way more optimal than MergeJoin, which
fits the pathkeys.

> Usually, sorted cheapest_total_path will be cheaper than sorted
> fractional/startup path at least by startup cost (as after sorting it
> includes total_cost of input path). But we ignore this case when
> selecting cheapest_startup and cheapest_fractional subpaths. As result
> selected cheapest_startup and cheapest_fractional can be not cheapest
> for startup or selecting a fraction of rows.
I don't know what you mean by that. The cheapest_total_path is
considered when we chose optimal cheapest_total path. The same works for
the fractional path - get_cheapest_fractional_path gives us the most
optimal fractional path and probes cheapest_total_path too.
As above, not sure about min-startup case for now. I can imagine
MergeAppend over sophisticated subquery: non-sorted includes highly
parameterised JOINs and the alternative (with pathkeys) includes
HashJoin, drastically increasing startup cost. It is only a theory, of
course. So, lets discover how min-startup works.

At the end, when the sorted path already estimated, we each time compare
it with previously selected pathkeys-cheapest. So, if the sorted path is
worse, we end up with the original path and don't lose anything.

[1]
https://www.postgresql.org/message-id/e8f9ec90-546d-e948-acce-0525f3e92773%40enterprisedb.com
[2]
https://www.postgresql.org/message-id/1581042da8044e71ada2d6e3a51bf7bb%40index.de
[3]
https://www.postgresql.org/message-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f685@gmail.com

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Kumar Veldanda 2025-04-25 15:15:26 Re: ZStandard (with dictionaries) compression support for TOAST compression
Previous Message Erik Rijkers 2025-04-25 15:05:15 gcc 15.1 warnings - jsonb_util.c