Re: SELECT DISTINCT ... ORDER BY problem

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Madison Kelly'" <linux(at)alteeve(dot)com>, "'David Fetter'" <david(at)fetter(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: SELECT DISTINCT ... ORDER BY problem
Date: 2008-12-09 01:15:10
Message-ID: 5D54869CF0584B7A97A78E886D24F107@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Madison Kelly Wrote:
> David Fetter wrote:
> > On Mon, Dec 08, 2008 at 11:16:29PM -0000, David Rowley wrote:
> >>> -----Original Message-----
> >>> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> >>> owner(at)postgresql(dot)org] On Behalf Of Madison Kelly
> >>> Sent: 08 December 2008 22:19
> >>> To: pgsql-general(at)postgresql(dot)org
> >>> Subject: [GENERAL] SELECT DISTINCT ... ORDER BY problem
> >>>
> >>> Hi all,
> >>>
> >>> I've got a table that I am trying to SELECT DISTINCT on one column
> >>> and ORDER BY on a second column, but am getting the error:
> >>>
> >>> SELECT DISTINCT ON expressions must match initial ORDER BY expressions
> >>>
> >>> I can't add the second column to the DISTINCT clause because every
> >>> row is unique. Likewise, I can't add the first column to my ORDER BY
> as
> >>> it'd not sort the way I need it to.
> >>>
> >>> Here is a simplified version of my query:
> >>>
> >>> \d table
> >>> Table "table"
> >>> Column | Type | Modifiers
> >>>
> >>> -----------------+---------+------------------------------------------
> ----
> >>> --
> >>> tbl_id | integer | not null default
> >>> nextval('tbl_seq'::regclass)
> >>> foo | text |
> >>> bar | text |
> >>>
> >>> SELECT DISTINCT ON (foo) foo, bar FROM table WHERE bar < '2008-12-07
> >>> 16:32:46' AND tbl_id=153 ORDER BY bar LIMIT 1;
> >>>
> >> To make the query valid you would have to ORDER BY foo,bar
> >> DISTINCT ON in this case is only going to show the first bar value for
> each
> >> foo.
> >>
> >> Is tbl_id not your PK and only giving 1 row anyway?
> >>
> >>> I understand from:
> >>>
> >>> http://archives.postgresql.org/pgsql-sql/2007-02/msg00169.php
> >>>
> >>> That this is not really possible because the any given 'foo' column
> >>> could match multiple 'bar' columns, so what do you search by? However,
> >>> it's made some sort of decision as a value is shown in 'bar' for each
> >>> 'foo'.
> >>>
> >>> So my question is two-fold:
> >>>
> >>> 1. Can I not say, somehow, "sort all results by 'bar', and return the
> >>> first/last 'bar' for each distinct 'foo'?
> >>>
> >>> 2. Can I somehow say "Order the results using the value of 'bar' you
> >>> return, regardless of where it came from"?
> >> You can nest queries:
> >>
> >> SELECT foo,bar
> >> FROM (SELECT DISTINCT ON (foo) foo,
> >> Bar
> >> FROM table
> >> WHERE bar < '2008-12-07 16:32:46'
> >> AND tbl_id=153 ORDER BY foo,bar
> >> ) AS t ORDER BY bar;
> >>
> >> Notice that I'm only applying the final order by in the outer query.
> >
> > When we get windowing functions, a lot of this pain will go away :)
> >
> > Cheers,
> > David.
>
> Oh?
>
> I can't say I've been keeping up with what is in the pipes. What is
> windowing?
>

These are also known as analytical functions in some other database systems,
though by the standard they are known as window functions.

http://en.wikipedia.org/wiki/Select_(SQL)

It's worth a read. Hopefully we'll see this in 8.4.

David.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adam Rich 2008-12-09 01:30:35 Re: SELECT DISTINCT ... ORDER BY problem
Previous Message David Rowley 2008-12-09 01:09:38 Re: SELECT DISTINCT ... ORDER BY problem