Quick Links

Re: GROUP BY DISTINCT

From:	Vik Fearing <vik(at)postgresfriends(dot)org>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Georgios Kokolatos <gkokolatos(at)protonmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>
Subject:	Re: GROUP BY DISTINCT
Date:	2021-03-16 08:21:03
Message-ID:	3817d5ec-f3e4-e4b9-d169-271ae5d3cbe4@postgresfriends.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-docs pgsql-hackers

On 3/13/21 12:33 AM, Tomas Vondra wrote:
> Hi Vik,
>
> The patch seems quite ready, I have just two comments.

Thanks for taking a look.

> 1) Shouldn't this add another <indexterm> for DISTINCT, somewhere in the
> documentation? Now the index points just to the SELECT DISTINCT part.

Good idea; I never think about the index.

> 2) The part in gram.y that wraps/unwraps the boolean flag as an integer,
> in order to stash it in the group lists is rather ugly, IMHO. It forces
> all the places handling the list to be aware of this (there are not
> many, but still ...). And there are no other places doing (bool) intVal
> so it's not like there's a precedent for this.

There is kind of a precedent for it, I was copying off of TriggerEvents
and func_alias_clause.

> I think the clean solution is to make group_clause produce a struct with
> two fields, and just use that. Not sure how invasive that will be
> outside gram.y, though.

I didn't want to create a whole new parse node for it, but Andrew Gierth
pointed me towards SelectLimit so I did it like that and I agree it is
much cleaner.

> Also, the all_or_distinct vs. distinct_or_all seems a bit error-prone. I
> wonder if we can come up with some clearer names, describing the context
> of those types.

I turned this into an enum for ALL/DISTINCT/default and the caller can
choose what it wants to do with default. I think that's a lot cleaner,
too. Maybe DISTINCT ON should be changed to fit in that? I left it
alone for now.

I also snuck in something that all of us overlooked which is outputting
the DISTINCT in ruleutils.c. I didn't add a test for it but that would
have been an unfortunate bug.

New patch attached, rebased on 15639d5e8f.
--
Vik Fearing

Attachment	Content-Type	Size
0001-implement-GROUP-BY-DISTINCT.v04.patch	text/x-patch	23.9 KB

In response to

Re: GROUP BY DISTINCT at 2021-03-12 23:33:36 from Tomas Vondra

Responses

Re: GROUP BY DISTINCT at 2021-03-16 14:52:52 from Tomas Vondra

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Tomas Vondra	2021-03-16 14:52:52	Re: GROUP BY DISTINCT
Previous Message	Jürgen Purtz	2021-03-15 08:06:51	Re: Change JOIN tutorial to focus more on explicit joins

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2021-03-16 08:29:05	Re: fdatasync performance problem with large number of DB files
Previous Message	Amit Kapila	2021-03-16 08:20:46	Re: pg_subscription - substream column?