Re: Remove useless GROUP BY columns considering unique index

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove useless GROUP BY columns considering unique index
Date: 2024-09-12 01:43:42
Message-ID: CAApHDvqHW0wo8rFzrtMgYa=bkxjfKOxSjiz_QqO5najWHjoRRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 30 Dec 2023 at 04:05, Zhang Mingli <zmlpostgres(at)gmail(dot)com> wrote:
> So my patch make it easy: check unique index’s columns, it’s a valid candidate if all of that have NOT NULL constraint.
> And we choose a best one who has the least column numbers in get_min_unique_not_null_attnos(), as the reason: less columns mean that more group by columns could be removed.

This patch no longer applies. We no longer catalogue NOT NULL
constraints, which this patch is coded to rely upon. (Likely it could
just look at pg_attribute.attnotnull instead)

Aside from that, I'm not sure about the logic to prefer removing
functionally dependent columns using the constraint with the least
columns. If you had the PRIMARY KEY on two int columns and a unique
index on a 1MB varlena column, I think using the primary key would be
a better choice to remove functionally dependent columns of. Maybe
it's ok to remove any functionally dependant columns on the primary
key first and try removing functionally dependent columns on unique
constraints and a 2nd step (or maybe only if none can be removed using
the PK?)

Also, why constraints rather than unique indexes? You'd need to check
for expression indexes and likely ignore those due to no ability to
detect NOT NULL for expressions.

Also, looking at the patch to how you've coded
get_min_unique_not_null_attnos(), it looks like you're very optimistic
about that being a constraint that has columns mentioned in the GROUP
BY clause. It looks like it won't work if the UNIQUE constraint with
the least columns gets no mention in the GROUP BY clause. That'll
cause performance regressions from what we have today when the primary
key is mentioned and columns can be removed using that but a UNIQUE
constraint exists which has no corresponding GROUP BY columns and
fewer unique columns than the PK. That's not good and it'll need to be
fixed.

Set to waiting on author.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-09-12 02:02:59 Re: Remove shadowed declaration warnings
Previous Message Tatsuo Ishii 2024-09-12 01:41:07 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options