Re: Remove useless GROUP BY columns considering unique index

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove useless GROUP BY columns considering unique index
Date: 2024-11-29 02:02:39
Message-ID: CAApHDvqLezKwoEBBQd0dp4Y9MDkFBDbny0f3SzEeqOFoU7Z5+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 29 Nov 2024 at 01:33, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Thu, Nov 28, 2024 at 2:11 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I think it'll make more sense to adjust some of the
> > existing tests to use a unique constraint instead of a PK and then
> > adjust a column's NOT NULL property to check that part of the code is
> > working correctly.
> >
> looking around, i inserted some tests to indexing.sql,
> create_index.sql, aggregates.sql
> also added tests for sort by index oid.

I meant the tests added by d4c3a156c. I don't think there's any need
to have tests in more than one file.

> > Another issue with this is that the list of indexes being used is not
> > sorted by Oid.

> When there are multiple matches, we need a determined way to choose which one.
> so please check attached.

Looking closer at get_relation_info(), it looks like the
RelOptInfo.indexlist will be in descending Oid order, per:

/*
* We've historically used lcons() here. It'd make more sense to
* use lappend(), but that causes the planner to change behavior
* in cases where two indexes seem equally attractive. For now,
* stick with lcons() --- few tables should have so many indexes
* that the O(N^2) behavior of lcons() is really a problem.
*/
indexinfos = lcons(info, indexinfos);

I'm starting to feel like we might be trying to include too much logic
to mimic this behaviour. The main problem is that the RelOptInfos
have not yet been created for the given relation when
remove_useless_groupby_columns() is called. That's why we're having to
query the catalogue tables like this. I do see a comment in
preprocess_groupclauses() that mentions:

* Note: we return a fresh List, but its elements are the same
* SortGroupClauses appearing in parse->groupClause. This is important
* because later processing may modify the processed_groupClause list.

So there's already a warning that there may still be future changes to
the processed_groupClause. It'd be nice if we could delay calling
remove_useless_groupby_columns() until the RelOptInfos are built as
we'd not have to query the catalogue tables to make this work. The
soonest point is just after the call to add_base_rels_to_query() in
query_planner(). That would mean doing more work on the
processed_groupClause in query_planner() rather than in
grouping_planner(), which is a little strange. It looks like the first
time we look at processed_groupClause and make a decision based on the
actual value of it is in standard_qp_callback(), i.e. after
add_base_rels_to_query().

I've attached an updated patch that gets rid of the
get_unique_not_null_attnos() function completely and uses the
RelOptInfo.indexlist and RelOptInfo.notnullattnums. I also realised
that there's no need to check for NOT NULL constraints with unique
indexes defined with NULLS NOT DISTINCT as having NULLs unique means
we can still determine the functional dependency. One other possibly
good outcome of this is that it's much easier to do the
pg_statistic.avgwidth thing I talked about as RelOptInfo stores those
in attr_widths, however, looking at
table_block_relation_estimate_size(), I see we might not always
populate those.

I ended up re-homing remove_useless_groupby_columns() in initsplan.c.
I couldn't see anywhere else it fitted.

David

Attachment Content-Type Size
v8-0001-remove-useless-group-by-columns-via-unique-not-nu.patch application/octet-stream 20.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-29 02:05:51 Memory leak in WAL sender with pgoutput (v10~)
Previous Message Tom Lane 2024-11-29 01:42:47 Re: Changing shared_buffers without restart