From: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2024-04-18 05:01:13 |
Message-ID: | 50c1c1dc-057c-421b-a1f4-62622c7afe61@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/12/24 06:44, Tom Lane wrote:
> * It seems like root->processed_groupClause no longer has much to do
> with the grouping behavior, which is scary given how much code still
> believes that it does. I suspect there are bugs lurking there, and
1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost
of sort to the foreign grouping.
- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets,
remove_useless_groupby_columns - we don't apply this feature in the case
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of
clauses.
- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in
the case of hash grouping.
So, the only case we can impact current behavior lies in the
postgres_fdw. But here we don't really know which plan will be chosen
during planning on foreign node and stay the same behavior is legal for me.
> am not comforted by the fact that the patch changed exactly nothing
> in the pathnodes.h documentation of that field. This comment looks
> pretty silly now too:
>
> /* Preprocess regular GROUP BY clause, if any */
> root->processed_groupClause = list_copy(parse->groupClause);
>
> What "preprocessing" is going on there? This comment was adequate
> before, when the code line invoked preprocess_groupclause which had
> a bunch of commentary; but now it has to stand on its own and it's
> not doing a great job of that.
Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by
clauses to the stage, where we already have underlying subtree and
incoming path keys. In principle, we could return the preprocessing
routine and arrange GROUP-BY with the ORDER-BY clause as it was before.
But looking into the code, I found only one reason to do this:
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get
here, I think we can discover it later with no hurry. A good outcome
here will be a test case that can show the importance of arranging
GROUP-BY and ORDER-BY at an early stage.
--
regards,
Andrei Lepikhov
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
minor_comment.patch | text/x-patch | 656 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Imseih (AWS), Sami | 2024-04-18 05:05:03 | Re: allow changing autovacuum_max_workers without restarting |
Previous Message | Nathan Bossart | 2024-04-18 04:17:12 | improve performance of pg_dump --binary-upgrade |