Quick Links

Re: Consider the number of columns in the sort cost model

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Consider the number of columns in the sort cost model
Date:	2024-10-29 02:47:27
Message-ID:	bea26aec-b9fd-4fac-a06e-bbad4f154287@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/28/24 16:48, Alena Rybakina wrote:
> On 23.10.2024 04:39, Andrei Lepikhov wrote:
>> On 15/10/2024 12:15, David Rowley wrote:
>> And the last patch is a demo of how I'm going to use the previous
>> three patches and add one more strategy to improve the order of
>> columns in the GROUP-BY clause.
>>
> To be honest, I didn’t find information about this in the code, but did
> I understand everything correctly?
Yes
>
> 2. I noticed that statistics of distinct values are calculated several
> times. Maybe I miss something, but this can be avoided if these
> statistics can be saved after calculation. For example, I saw that it is
> possible that you calculate the same statistic information for the same
> equivalence members in cost_incremental_sort and identify_sort_ecmember.
> Is it possible to store information in a structure and use it later?
Hmm, I don't see multiple calculations. em_distinct has made
specifically for this sake. Can you provide specific case and code lines?
>
> 3. I think you should initialize the variable ndist in your patch. I
> faced the compile complaining during your code compilation.
>
> costsize.c: In function ‘identify_sort_ecmember’:
> costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> 6694 |                         em->em_ndistinct = ndist;
>       |                         ~~~~~~~~~~~~~~~~~^~~~~
> costsize.c:6680:33: note: ‘ndist’ was declared here
> 6680 |                         double ndist;
>       |                                 ^~~
> cc1: all warnings being treated as errors
> gmake[4]: *** [<builtin>: costsize.o] Error 1
I think you can just update your compiler. But I added the ndist
initialisation to make more compilers happy :).

> +       Assert (node != NULL);
> +
>         examine_variable(root, node, 0, &vardata);
>         if (!HeapTupleIsValid(vardata.statsTuple))
>             continue;
I don't think so. At least until you provide the case when the
get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the
perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly
because of 828e94c. So, see the new version in the attachment.

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v2-0001-Stabilise-incremental-sort-cost-calculation.patch	text/x-patch	9.2 KB
v2-0002-Consider-the-number-of-columns-in-the-cost-sort-m.patch	text/x-patch	13.8 KB
v2-0003-Consider-ndistinct-on-the-first-column-in-cost_so.patch	text/x-patch	8.7 KB
v2-0004-Add-GROUP-BY-strategy-put-the-most-distinct-colum.patch	text/x-patch	7.1 KB

In response to

Re: Consider the number of columns in the sort cost model at 2024-10-28 09:48:19 from Alena Rybakina

Responses

Re: Consider the number of columns in the sort cost model at 2024-10-31 14:00:09 from Alena Rybakina

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2024-10-29 03:06:25	Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on
Previous Message	Peter Smith	2024-10-29 02:14:17	Re: Pgoutput not capturing the generated columns