Quick Links

Remove useless GROUP BY columns considering unique index

From:	Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Remove useless GROUP BY columns considering unique index
Date:	2023-12-29 15:04:34
Message-ID:	327990c8-b9b2-4b0c-bffb-462249f82de0@Spark
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

This idea first came from remove_useless_groupby_columns does not need to record constraint dependencie[0] which points out that
unique index whose columns all have NOT NULL constraints could also take the work with primary key when removing useless GROUP BY columns.
I study it and implement the idea.

Ex:

create temp table t2 (a int, b int, c int not null, primary key (a, b), unique(c));

explain (costs off) select * from t2 group by a,b,c;
QUERY PLAN
----------------------
HashAggregate
Group Key: c
-> Seq Scan on t2

The plan drop column a, b as I did a little more.

For the query, as t2 has primary key (a, b), before this patch, we could drop column c because {a, b} are PK.
And we have an unique index(c) with NOT NULL constraint now, we could drop column {a, b}, just keep {c}.

While we have multiple choices, group by a, b (c is removed by PK) and group by c (a, b are removed by unique not null index)
And I implement it to choose the one with less columns so that we can drop as more columns as possible.
I think it’s good for planner to save some cost like Sort less columns.
There may be better one for some reason like: try to keep PK for planner?
I’m not sure about that and it seems not worth much complex.

The NOT NULL constraint may also be computed from primary keys, ex:
create temp table t2 (a int, b int, c int not null, primary key (a, b), unique(a));
Primary key(a, b) ensure a is NOT NULL and we have a unique index(a), but it will introduce more complex to check if a unique index could be used.
I also doubt it worths doing that..
So my patch make it easy: check unique index’s columns, it’s a valid candidate if all of that have NOT NULL constraint.
And we choose a best one who has the least column numbers in get_min_unique_not_null_attnos(), as the reason: less columns mean that more group by columns could be removed.

create temp table t3 (a int, b int, c int not null, d int not null, primary key (a, b), unique(c, d));
-- Test primary key beats unique not null index.
explain (costs off) select * from t3 group by a,b,c,d;
QUERY PLAN
----------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t3
(3 rows)

create temp table t4 (a int, b int not null, c int not null, d int not null, primary key (a, b), unique(b, c), unique(d));
-- Test unique not null index with less columns wins.
explain (costs off) select * from t4 group by a,b,c,d;
QUERY PLAN
----------------------
HashAggregate
Group Key: d
-> Seq Scan on t4
(3 rows)

The unique Indices could have overlaps with primary keys and indices themselves.

create temp table t5 (a int not null, b int not null, c int not null, d int not null, unique (a, b), unique(b, c), unique(a, c, d));
-- Test unique not null indices have overlap.
explain (costs off) select * from t5 group by a,b,c,d;
QUERY PLAN
----------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t5
(3 rows)

[0]https://www.postgresql.org/message-id/flat/CAApHDvrdYa%3DVhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL%2BPYMVN8OnQ%40mail.gmail.com

Zhang Mingli
www.hashdata.xyz

Attachment	Content-Type	Size
v1-0001-Remove-useless-GROUP-BY-columns-considering-unique-i.patch	application/octet-stream	14.2 KB

Responses

Re: Remove useless GROUP BY columns considering unique index at 2023-12-31 14:37:46 from jian he
Re: Remove useless GROUP BY columns considering unique index at 2024-09-12 01:43:42 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bharath Rupireddy	2023-12-29 15:23:54	Re: introduce dynamic shared memory registry
Previous Message	Tomas Vondra	2023-12-29 14:32:18	Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)