Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: POC: GROUP BY optimization
Date: 2023-12-26 11:37:01
Message-ID: 6c67b7bb-5469-475f-8988-0a50b4c64852@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21/12/2023 17:53, Alexander Korotkov wrote:
> On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> New version of the patch. Fixed minor inconsistencies and rebased onto
>> current master.
> Thank you (and other authors) for working on this subject. Indeed to
> GROUP BY clauses are order-agnostic. Reordering them in the most
> suitable order could give up significant query planning benefits. I
> went through the thread: I see significant work has been already made
> on this patch, the code is quite polished.
Maybe, but issues, mentioned in [1], still not resolved. It is the only
reason, why this thread hasn't been active.
> I'd like to make some notes.
> 1) As already mentioned, there is clearly a repetitive pattern for the
> code following after get_useful_group_keys_orderings() calls. I think
> it would be good to extract it into a separate function. Please, do
> this as a separate patch coming before the group-by patch. That would
> simplify the review.
Yeah, these parts of code a bit different. I will try to make common
routine.
> 2) I wonder what planning overhead this patch could introduce? Could
> you try to measure the worst case? What if we have a table with a lot
> of indexes and a long list of group-by clauses partially patching
> every index. This should give us an understanding on whether we need
> a separate GUC to control this feature.
Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to
> get_cheapest_group_keys_order() function. Each time
> get_cheapest_group_keys_order() performs the cost estimate and
> reorders the free keys. However, cost estimation implies the system
> catalog lookups (that is quite expensive). I wonder if we could
> change the algorithm. Could we just sort the group-by keys by cost
> once, save this ordering and then just re-use it. So, every time we
> need to reorder a group by, we can just pull the required keys to the
> top and use saved ordering for the rest. I also wonder if we could do
> this once for add_paths_to_grouping_rel() and
> create_partial_grouping_paths() calls. So, it probably should be
> somewhere in create_ordinary_grouping_paths().
Thanks for the idea!> 4) I think we can do some optimizations when
enable_incremental_sort
> == off. Then in get_useful_group_keys_orderings() we should only deal
> with input_path fully matching the group-by clause, and try only full
> match of group-by output to the required order.
Oh, we had designed before the incremental sort was invented. Will see
what we can do here.

[1]
https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2023-12-26 12:09:47 Re: Intermittent failure with t/003_logical_slots.pl test on windows
Previous Message Richard Guo 2023-12-26 11:23:02 An improvement on parallel DISTINCT