Re: Consider the number of columns in the sort cost model

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Consider the number of columns in the sort cost model
Date: 2024-10-29 02:47:27
Message-ID: bea26aec-b9fd-4fac-a06e-bbad4f154287@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/28/24 16:48, Alena Rybakina wrote:
> On 23.10.2024 04:39, Andrei Lepikhov wrote:
>> On 15/10/2024 12:15, David Rowley wrote:
>> And the last patch is a demo of how I'm going to use the previous
>> three patches and add one more strategy to improve the order of
>> columns in the GROUP-BY clause.
>>
> To be honest, I didn’t find information about this in the code, but did
> I understand everything correctly?
Yes
>
> 2. I noticed that statistics of distinct values ​​are calculated several
> times. Maybe I miss something, but this can be avoided if these
> statistics can be saved after calculation. For example, I saw that it is
> possible that you calculate the same statistic information for the same
> equivalence members in cost_incremental_sort and identify_sort_ecmember.
> Is it possible to store information in a structure and use it later?
Hmm, I don't see multiple calculations. em_distinct has made
specifically for this sake. Can you provide specific case and code lines?
>
> 3. I think you should initialize the variable ndist in your patch. I
> faced the compile complaining during your code compilation.
>
> costsize.c: In function ‘identify_sort_ecmember’:
> costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
> [-Werror=maybe-uninitialized]
>  6694 |                         em->em_ndistinct = ndist;
>       |                         ~~~~~~~~~~~~~~~~~^~~~~
> costsize.c:6680:33: note: ‘ndist’ was declared here
>  6680 |                         double  ndist;
>       |                                 ^~~
> cc1: all warnings being treated as errors
> gmake[4]: *** [<builtin>: costsize.o] Error 1
I think you can just update your compiler. But I added the ndist
initialisation to make more compilers happy :).

> +        Assert (node != NULL);
> +
>          examine_variable(root, node, 0, &vardata);
>          if (!HeapTupleIsValid(vardata.statsTuple))
>              continue;
I don't think so. At least until you provide the case when the
get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the
perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly
because of 828e94c. So, see the new version in the attachment.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
v2-0001-Stabilise-incremental-sort-cost-calculation.patch text/x-patch 9.2 KB
v2-0002-Consider-the-number-of-columns-in-the-cost-sort-m.patch text/x-patch 13.8 KB
v2-0003-Consider-ndistinct-on-the-first-column-in-cost_so.patch text/x-patch 8.7 KB
v2-0004-Add-GROUP-BY-strategy-put-the-most-distinct-colum.patch text/x-patch 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-10-29 03:06:25 Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on
Previous Message Peter Smith 2024-10-29 02:14:17 Re: Pgoutput not capturing the generated columns