Quick Links

A counter argument about DISTINCT and GROUP BY in PostgreSQL

From:	dterrors(at)hotmail(dot)com
To:	pgsql-general(at)postgresql(dot)org
Subject:	A counter argument about DISTINCT and GROUP BY in PostgreSQL
Date:	2008-01-01 19:33:16
Message-ID:	86ad02d8-37c6-441a-8af5-824dde5c10b2@p69g2000hsa.googlegroups.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I've just spent a few hours searching and reading about the postgres
way of selecting distinct records. I understand the points made about
the ORDER BY limitation of DISTINCT ON, and the abiguity of GROUP BY,
but I think there's a (simple, common) case that have been missed in
the discussion. Here is my sitation:

table "projects":
id title more stuff (pretend there's 20 more columns.)
-----------------------------------------------------------
1 buildrome moredata inothercolumns
2 housework evenmoredata letssay20columns

table "todos":
id projectid name duedate
-----------------------------------------
1 1 conquer 1pm
2 1 laybricks 10pm
3 2 dolaundry 5pm

In english, I want to "select projects and order them by the ones that
have todos due the soonest." Does that sound like a reasonable
request?

This won't work in postgresql 8.2.4:

select distinct on(a.id) a.* from projects a inner join todos b on
b.projectid = a.id order by b.duedate offset 10 limit 20;
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY
expressions

What to do? I could:

Option A. select distinct on(b.duedate, a.id) a.* from projects a
inner join todos b on b.projectid = a.id order by b.duedate, a.id
offset 10 limit 20;

But that's not equivalent because the duedates will break uniqueness.
The results will have two records where the a.id is 1. So I guess
that's a non-option.

Option B. I could try group by:
select a.* from projects a inner join todos b on b.projectid = a.id
group by a.id order by b.duedate offset 10 limit 20;
Query failed: ERROR: column "a.title" must appear in the GROUP BY
clause or be used in an aggregate function

(And presumably every other column of projects would too).

Option C. I could list out each and every one of the 20+ columns of
the project table in MAX() functions so that they'd be "in an
aggregate function":
select MAX(a.title) as title, MAX(a.column2) as column2,
MAX(a.column3) as column3, MAX(a.column4) as column4, MAX(a.column5)
as column5, MAX(a.column6) as column6, MAX(a.column7) as column7,
MAX(a.column8) as column8, MAX(a.column9) as column9, MAX(a.column8)
as ihatepostgresql, MAX(a.column9) as imgoingbacktomysql, from
projects a inner join todos b on b.projectid = a.id group by a.id
order by b.duedate offset 10 limit 20;

http://groups.google.com/group/comp.databases.postgresql.general/msg/923aed5b8ea5faa1

Option D: select distinct a.* from projects a inner join todos b on
b.projectid = a.id order by b.duedate offset 10 limit 20;
Query failed: ERROR: for SELECT DISTINCT, ORDER BY expressions must
appear in select list

But I don't want b.duedate in my results. And if I add it, it will,
again, break the desired uniqueness (see option A). (And, actually
even if that one worked, it would suck. Instead of making postgres
compare only one column to determine uniqueness, it now has to compare
all of them. (In other words, the whole advantage of DISTINCT ON is
out the window).)

Option E: I could use a subselect. But notice my offset, limit. If I
use a subselect, then postgresql would have to build ALL of the
results in memory (to create the subselect virtual table), before I
apply the offset and limit on the subselect.

Any suggestion would be appreciated.

BTW for those of you who are curious, in mysql (that other db), this
would be:

select a.* from projects a inner join todos b on b.projectid = a.id
group by a.id order by b.duedate limit 10,20;

Responses

Re: A counter argument about DISTINCT and GROUP BY in PostgreSQL at 2008-01-04 16:48:04 from Gregory Stark
Re: A counter argument about DISTINCT and GROUP BY in PostgreSQL at 2008-01-04 16:50:52 from Alvaro Herrera

Browse pgsql-general by date

	From	Date	Subject
Next Message	Matthew Wilson	2008-01-02 01:00:26	Need help requiring uniqueness in text columns
Previous Message	Andrew Sullivan	2008-01-01 17:40:55	Re: basic questions: Postgres with yum on CentOS 5.1