Re: Side effect of remove_useless_groupby_columns

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Side effect of remove_useless_groupby_columns
Date: 2021-02-28 09:14:59
Message-ID: CAApHDvq0HxnUJgz=X7mzL3c9z5NyGAm8aZoQqrGk0E3kEbXePg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 28 Feb 2021 at 20:52, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> When looking at [1], I realized we may have a side effect when removing
> redundant columns in the GROUP BY clause. Suppose we have a query with
> ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide
> that 'b' is redundant due to being functionally dependent on other GROUP
> BY columns, we would remove it from group keys. This will make us lose
> the opportunity to leverage the index on 'b'.

> Any thoughts?

I had initially thought that this was a non-issue before incremental
sort was added, but the following case highlights that's wrong.

# create table t (a int not null, b int);
# insert into t select i, i%1000 from generate_series(1,1000000)i;
# create index on t(b,a);

# explain (analyze) select b from t group by a, b order by b,a limit 10;
# alter table t add constraint t_pkey primary key (a);
# explain (analyze) select b from t group by a, b order by b,a limit 10;

Execution slows down after adding the primary key since that allows
the functional dependency to be detected and the "b" column removed
from the GROUP BY clause.

Perhaps we could do something like add:

diff --git a/src/backend/optimizer/plan/planner.c
b/src/backend/optimizer/plan/planner.c
index 545b56bcaf..7e52212a13 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3182,7 +3182,8 @@ remove_useless_groupby_columns(PlannerInfo *root)
if (!IsA(var, Var) ||
var->varlevelsup > 0 ||
!bms_is_member(var->varattno -
FirstLowInvalidHeapAttributeNumber,
-
surplusvars[var->varno]))
+
surplusvars[var->varno]) ||
+ list_member(parse->sortClause, sgc))
new_groupby = lappend(new_groupby, sgc);
}

to remove_useless_groupby_columns(). It's not exactly perfect though
since it's still good to remove the useless GROUP BY columns if
there's no useful index to implement the ORDER BY. We just don't know
that when we call remove_useless_groupby_columns().

FWIW, these also don't really seem to be all that great examples since
it's pretty useless to put the primary key columns in the GROUP BY
unless there's some join that's going to duplicate those columns.
Having the planner remove the GROUP BY completely in that case would
be nice. That's a topic of discussion in the UniqueKeys patch thread.
However, it is possible to add a join with a join condition matching
the ORDER BY and have the same problem when a join type that preserves
the outer path's order is picked.

David

> [1] https://www.postgresql.org/message-id/flat/16869-26346b77d6ccaeec%40postgresql.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Förster 2021-02-28 09:57:32 proposal: psql –help reflecting service or URI usage
Previous Message Richard Guo 2021-02-28 07:52:24 Side effect of remove_useless_groupby_columns