Quick Links

Re: Why does GROUP BY reduce number of rows?

From:	Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To:	Rocky Ji <rockyji3438(at)gmail(dot)com>
Cc:	pgsql-sql(at)lists(dot)postgresql(dot)org
Subject:	Re: Why does GROUP BY reduce number of rows?
Date:	2019-03-01 13:25:53
Message-ID:	87y35yd9ul.fsf@news-spur.riddles.org.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

>>>>> "Rocky" == Rocky Ji <rockyji3438(at)gmail(dot)com> writes:

Rocky> Let's call this error/mistake a "UDFS" (Unintended Deviation
Rocky> From Standards).

What deviation? If the standard leaves something unspecified, the
PostgreSQL can choose to do as it pleases. (We can also choose to just
violate the standard...)

Rocky> I think: reduction in number of rows in output when `GROUP BY`
Rocky> clause is used is another UDFS.

Reducing the number of rows of output is the SOLE PURPOSE AND REASON TO
EXIST of the GROUP BY clause. By definition, the clause produces only
one output row for each matching group in the input.

The rows produced by GROUP BY are also not "the final row" or even "the
first row" of the group. They are _new_ rows computed from column
expressions which must reference the input in only two ways:

- aggregate functions computed over the input group

- values which can be proved by the implementation to be guaranteed to
be the same in all rows of the input group (that is, the grouping
columns themselves and anything the implementation can prove is a
functional dependency of the grouping columns).

Some other databases may be lax about enforcing these restrictions and
allow non-dependent columns to be included in the result, taking their
values from some arbitrary row of the input group. PG if anything is
_too_ strict about this, since for implementation reasons we can prove
functional dependency only in very limited cases and we reject anything
else.

Rocky> What if important (i.e. "uniquely identifying" for "human eyes")
Rocky> data was contained in first row of a group. Why does the
Rocky> underlying program (psql server, I suppose) **decide** that
Rocky> preceding rows are not worth displaying?

The person who wrote GROUP BY in the query made that decision.

BTW, if you want to calculate aggregated values across rows without
reducing the number of rows of output, there's an entirely separate
mechanism for that: window functions.

--
Andrew (irc:RhodiumToad)

In response to

Why does GROUP BY reduce number of rows? at 2019-03-01 12:55:44 from Rocky Ji

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Lou	2019-03-06 19:59:22	INSERT / UPDATE into 2 inner joined table simultaneously
Previous Message	Achilleas Mantzios	2019-03-01 13:17:48	Re: Top 3 values for each group in PGSQL