From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Consider the number of columns in the sort cost model |
Date: | 2024-10-28 09:48:19 |
Message-ID: | 0870a8c4-4579-4428-bacb-c604463ab2ae@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi! Thank you for your work on this subject!
On 23.10.2024 04:39, Andrei Lepikhov wrote:
> On 15/10/2024 12:15, David Rowley wrote:
>> On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> I think maybe what is worth working on is seeing if you can better
>> estimate the number of tiebreak comparisons by checking the number of
>> distinct values for each sort key. That might require what we
>> discussed on another thread about having estimate_num_groups() better
>> use the n_distinct estimate from the EquivalenceMember with the least
>> distinct values. It'll be another question if all that can be done
>> and this all still perform well enough for cost_sort(). You may have
>> to write it first before we can figure that out. It may be possible
>> to buy back some of the overheads with some additional caching...
>> Perhaps that could be done within the EquivalenceClass.
> It seems that your idea with an equivalence class works.
> In the patchset attached, I show all the ideas (I initially wanted to
> discuss it step-by-step).
> In the attached, the first patch is about choosing adequate expression
> from the list of EC members.
> The second one is the current patch, which considers the number of
> sort columns.
> Third patch - elaboration of the second patch. Use only the trustful
> statistics here - the number of ndistinct values on the first sorted
> column.
> And the last patch is a demo of how I'm going to use the previous
> three patches and add one more strategy to improve the order of
> columns in the GROUP-BY clause.
>
I have some question about your patches.
1. You consider two lists root->group_pathkeys and
root->processed_groupClause.
As far as I understand, this is due to the fact that the pathkey
elements are identical to the group clause and identity verification is
precisely checked by this condition:
if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)
To be honest, I didn’t find information about this in the code, but did
I understand everything correctly?
2. I noticed that statistics of distinct values are calculated several
times. Maybe I miss something, but this can be avoided if these
statistics can be saved after calculation. For example, I saw that it is
possible that you calculate the same statistic information for the same
equivalence members in cost_incremental_sort and identify_sort_ecmember.
Is it possible to store information in a structure and use it later?
3. I think you should initialize the variable ndist in your patch. I
faced the compile complaining during your code compilation.
costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
[-Werror=maybe-uninitialized]
6694 | em->em_ndistinct = ndist;
| ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
6680 | double ndist;
| ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1
4. Before executing examine_variable function I think you should check
that it is not null.
@@ -575,6 +575,8 @@ get_useful_group_keys_orderings(PlannerInfo *root,
Path *path)
*/
continue;
+ Assert (node != NULL);
+
examine_variable(root, node, 0, &vardata);
if (!HeapTupleIsValid(vardata.statsTuple))
continue;
--
Regards,
Alena Rybakina
Postgres Professional
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-10-28 10:20:57 | Re: New "raw" COPY format |
Previous Message | Peter Eisentraut | 2024-10-28 09:45:21 | Re: further #include cleanup (IWYU) |