Re: Removing useless DISTINCT clauses

From: Jim Finnerty <jfinnert(at)amazon(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing useless DISTINCT clauses
Date: 2018-08-22 20:11:55
Message-ID: 1534968715834-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

distinct_optimization_v6.patch
<http://www.postgresql-archive.org/file/t348990/distinct_optimization_v6.patch>

Here is an update to this thread, for potential inclusion in v12. I
couldn't get the most recent 'v5' patch to apply cleanly, so I recreated a
v6 patch on PG10.5 by hand, and made a few changes and improvements:

- (a) If there is only one relation and the PK is present in the SELECT
list, then the distinctClause can be removed. Thus, a query of the form
SELECT DISTINCT pk, * FROM t; recognizes that the body of the SELECT is
already distinct on pk, and doesn't do an unnecessary DISTINCT operation
- (b) If there is only one relation, the PK is present, and there are no
aggregates, then the groupByClause can similarly be removed
- (c) If DISTINCT ON is specified, but no ORDER BY is specified, then it
acts like a regular DISTINCT
- (d) If the distinct clause is modified, this fact is recorded in a new
bool in the Query struct so that when the distinct was modified,
create_distinct_paths no longer Asserts that the needed pathkeys are
contained in the distinct_pathkeys.
- (e) Since the new bool is in the Query struct,
copyfuncs/equalfuncs/outfuncs/readfuncs support is also provided. This will
cause a problem unless the pg_rewrite catalog is regenerated.

The pg_rewrite catalog contains a serialized representation of the Query
node in its ev_action column. If there is a way to recreate the contents of
the pg_rewrite relation without bumping the catversion, can someone please
explain how? If not, then this change is incomplete and would require a new
catalog version (catversion.h) too.

Additional work on this patch would be desirable. It should check for
unique + not null, in addition to just the pk constraint. The DISTINCT
could be eliminated in cases with multiple relations if all the joins are
1:1, although that would arguably belong in a different patch.

/Jim

p.s. the v6 patch works for the problem case that Tom Lane reported with the
v5 patch

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2018-08-22 21:29:17 Re: jsonpath
Previous Message Daniel Verite 2018-08-22 20:05:41 Re: csv format for psql