Re: Remove useless GROUP BY columns considering unique index

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove useless GROUP BY columns considering unique index
Date: 2024-11-22 13:24:12
Message-ID: CACJufxFZn3VAuAPnvVWZ3pk6Kn_zHMJh=n0dpw6gu7B0B-Y3vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 12, 2024 at 9:44 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 30 Dec 2023 at 04:05, Zhang Mingli <zmlpostgres(at)gmail(dot)com> wrote:
> > So my patch make it easy: check unique index’s columns, it’s a valid candidate if all of that have NOT NULL constraint.
> > And we choose a best one who has the least column numbers in get_min_unique_not_null_attnos(), as the reason: less columns mean that more group by columns could be removed.
>
> This patch no longer applies. We no longer catalogue NOT NULL
> constraints, which this patch is coded to rely upon. (Likely it could
> just look at pg_attribute.attnotnull instead)
>
> Aside from that, I'm not sure about the logic to prefer removing
> functionally dependent columns using the constraint with the least
> columns. If you had the PRIMARY KEY on two int columns and a unique
> index on a 1MB varlena column, I think using the primary key would be
> a better choice to remove functionally dependent columns of. Maybe
> it's ok to remove any functionally dependant columns on the primary
> key first and try removing functionally dependent columns on unique
> constraints and a 2nd step (or maybe only if none can be removed using
> the PK?)
>

Let's be conservative.
if the primary key is there, then using it to
remove other useless columns;
if not there. found out the unique-index-not-null columns,
using it to remove other columns in the groupby clause.

> Also, why constraints rather than unique indexes? You'd need to check
> for expression indexes and likely ignore those due to no ability to
> detect NOT NULL for expressions.
>
some unique indexes don't have pg_constraint entry.
for example:
create table t(a int);
create unique index idx_t on t(a);

overall i come up with the attached patch.
instead of loop through pg_constraint, i loop over pg_index, extract indkey.
multiple indkey can match again groupby expression.
doing some loop, found out the indkey that has a minimum number of columns.
for example:
unique-not-null columns can be
(a,b)
(a,b,c)
since (a,b) only has two columns, using (a,b) to remove other columns
in the groupby expression.

some tests are copied from Zhang Mingli.
Code implementation is quite different.

Attachment Content-Type Size
v2-0001-remove-useless-group-by-columns-via-unique-index.patch text/x-patch 13.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-11-22 13:26:56 Re: Changed behavior in rewriteheap
Previous Message Rahila Syed 2024-11-22 13:03:36 Re: Enhancing Memory Context Statistics Reporting