Re: Distinct on a non-sort column

From: Cstdenis <lists(at)on-track(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Distinct on a non-sort column
Date: 2011-11-05 21:48:34
Message-ID: 4EB5AF32.1080404@on-track.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 11/5/2011 12:49 PM, Tom Lane wrote:
> Cstdenis<lists(at)on-track(dot)ca> writes:
>> I am trying to write a query that selects recent submissions (sorted by
>> submission_date) but only selects the most recent one for each user_id.
>> example query: /select distinct on (user_id) * from stories order by
>> date_submitted desc limit 10;/
>> However postgres will not allow me to filter out duplicate rows with
>> distinct unless I sort on that column, which would product useless
>> results for me.
> Do the DISTINCT ON in a sub-query, with an ORDER BY appropriate for that
> task, and then re-sort the rows the way you want them presented in the
> outer query.
>
> SELECT ... FROM
> (SELECT DISTINCT ON ... ORDER BY ...) ss
> ORDER BY ...;
>
> regards, tom lane

If I understand that you are proposing as

select * from
(select distinct on (user_id) * from stories as s order
by user_id) as foo
order by date_submitted desc limit 10;

I think it has the problem of which of the stories by that user is
selected is random rather than the most recent being guaranteed (because
the distinct is done before the sort). Or am I misunderstanding this?

The suggestions by others of using max(date_submitted) may be a good
workaround for this, but I also need to do the same thing sorted by a
calculated score value which I do not think will be sufficiently unique
for Tair's suggestion.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-11-05 23:11:31 Re: Distinct on a non-sort column
Previous Message Antonio Goméz Soto 2011-11-05 21:37:24 Re: What is *wrong* with this query???