Quick Links

Re: Remove useless GROUP BY columns considering unique index

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject:	Re: Remove useless GROUP BY columns considering unique index
Date:	2024-12-02 04:18:42
Message-ID:	480861d3-a250-4ec6-885a-75a099ecadf7@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 29/11/2024 14:28, David Rowley wrote:
> On Fri, 29 Nov 2024 at 19:36, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> 1. Thread reference in the patch comment doesn't work.
>> 2. May it be possible to move remove_useless_groupby_columns in the
>> second commit? It would be easier to review the differences.
>
> For #1, I rewrote the commit message.
>
> I've split into two patches for ease of review. See attached.
Thanks!
Patch 0001 seems OK. The differentiation of 'planmain' and 'initsplan'
is not clear for me, but code movement seems correct. I should say that
comment in initsplan.c:
'... initialization routines'
looks strange. Is this module about initialisation now?

Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
been found.
It may restrict future optimisations like [1], where the upper query
block can employ the 'uniqueness' of grouped clauses in selectivity
estimations. But I think it is OK for now.

NOTES:
1. Uniqueness is proved by a UNIQUE index. It might be a bit more
sophisticated to probe not only grouping qual but also employ
equivalence classes. In that case, we could process something like that:

CREATE TABLE test (
x int NOT NULL, y int NOT NULL, z int NOT NULL, w int);
CREATE UNIQUE INDEX ON test(y,z);
SELECT t2.z FROM test t2, test t1 WHERE t1.y=t2.y
GROUP BY t1.y,t2.z,t2.w;

2. The same idea could be used with DISTINCT statement. Right now we have:
SELECT DISTINCT y,z,w FROM test;

HashAggregate
Output: y, z, w
Group Key: test.y, test.z, test.w
-> Seq Scan on public.test
Output: x, y, z, w

[1]
https://www.postgresql.org/message-id/flat/50fe6779-ee2d-4256-bc64-cd661bc4029a%40gmail.com

--
regards, Andrei Lepikhov

In response to

Re: Remove useless GROUP BY columns considering unique index at 2024-11-29 07:28:59 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2024-12-02 04:22:09	Re: Remove useless GROUP BY columns considering unique index
Previous Message	Amit Kapila	2024-12-02 04:14:01	Re: Conflict detection for update_deleted in logical replication