Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-05-30 04:22:46
Message-ID: 3e755fd2-cf59-49d2-ae95-38c78daeba85@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/29/24 19:53, Alexander Korotkov wrote:
> Hi, Andrei!
>
> Thank you for your feedback.
>
> On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 5/27/24 19:41, Alexander Korotkov wrote:
>>> Any thoughts?
>> About 0001:
>> Having overviewed it, I don't see any issues (but I'm the author),
>> except grammatical ones - but I'm not a native to judge it.
>> Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear
>> to me. It may be better to write something like: 'building pathkeys by
>> the list of grouping clauses'.
>
> OK, thank you. I'll run once again for the grammar issues.
>
>> 0002:
>> The part under USE_ASSERT_CHECKING looks good to me. But the code in
>> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some
>> doubtful work without any possible way to reproduce, but if we envision
>> some duplicated elements in the group_clauses, we should avoid usage of
>> the list_concat_unique_ptr.
>
> As I understand Tom, there is a risk that clauses list may contain
> multiple instances of equivalent SortGroupClause, not duplicate
> pointers.
Maybe. I just implemented the worst-case scenario with the intention of
maximum safety. So, it's up to you.
>
>> What's more, why do you not exit from
>> foreach_ptr immediately after SortGroupClause has been found? I think
>> the new_group_clauses should be consistent with the new_group_pathkeys.
>
> I wanted this to be consistent with preprocess_groupclause(), where
> duplicate SortGroupClause'es are grouped together. Otherwise, we
> could just delete redundant SortGroupClause'es.
Hm, preprocess_groupclause is called before the standard_qp_callback
forms the 'canonical form' of group_pathkeys and such behaviour needed.
But the code that chooses the reordering strategy uses already processed
grouping clauses, where we don't have duplicates in the first
num_groupby_pathkeys elements, do we?
>> 0004:
>> I was also thinking about reintroducing the preprocess_groupclause
>> because with the re-arrangement of GROUP-BY clauses according to
>> incoming pathkeys, it doesn't make sense to have a user-defined order—at
>> least while cost_sort doesn't differ costs for alternative column orderings.
>> So, I'm okay with the code. But why don't you use the same approach with
>> foreach_ptr as before?
>
> I restored the function as it was before 0452b461bc with minimal edits
> to support the incremental sort. I think it would be more valuable to
> keep the difference with pg16 code small rather than refactor to
> simplify existing code.
Got it

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-05-30 04:48:11 Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Previous Message David Rowley 2024-05-30 04:02:58 Re: Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)