Re: Select unique set of record, distinct on, please help!!!

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: worker <tzhai2007(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Select unique set of record, distinct on, please help!!!
Date: 2008-05-17 04:33:06
Message-ID: dcc563d10805162133h4fcf5837p3db86ebc23122042@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, May 16, 2008 at 10:13 AM, worker <tzhai2007(at)gmail(dot)com> wrote:
> Dear all,
> I am still new to the PostGres DB, I 've been using other DB for
> quite a while, so SQL is OK with me. I am running into problem trying
> to condense duplicate records in a table into one record. Naturally, I
> think about 'DISTINCT ON' in select, but I just couldn't figure out
> the proper syntax to use it correctly. So, please help.
>
> My table is as follow:
> Column | Type |
> Modifiers
> -----------------+-------------------------------
> +-------------------------------------------------------
> tag_id | character varying(15) |
> enddate | date |
> storetcodei | integer |
> gtlt | character(1) |
> value | real |
> autoid | integer | not null default
> nextval('results_autoid_seq'::regclass)
>
>
> The main issue is that a lot of records have the same 'tag_id' and
> 'enddate', which is redundant. So I was trying to get rid of all the
> duplicates, but just leave one copy in the database, simple you say,
> right? NOT!
>
> I used many variants of the following query, but it seems psql is
> not doing the right thing, it always gotten rid of all duplicate, not
> leaving even one record from all the duplicates, in other db, all I
> need is 'unique', so please could someone just give me a SELECT INOT
> or CREATE TABLE AS statements that can rid of the duplicate but leave
> one copy in the table.
>
> create table results1 as select distinct on (tag_id,enddate) * from
> resultsall;

What exactly happens that's wrong. I've used something like that
before and it should work.

Another possibility is to do it in place. With transactions you can
always try something, check it, and roll back as needed.

let's say you have an id field that IS unique, we'll call it id (you
can create one if you don't have one)

alter table results1 add id int;
create temp sequence myseq;
update results1 set id=nextval('myseq');
select =id from results1 r1 join results r2 on (r1.tag_id=r2.tag and
r1.enddate=r2.enddate and r1.id > r2.id);

That query should return all the max ids of the matching rows. The
id>id is the "order by" here. You could replace it with some other
comparison for ordering.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Figueiredo Jr. 2008-05-17 05:38:16 Re: How to create a function with multiple RefCursor OUT parameters
Previous Message Tom Lane 2008-05-17 03:01:57 Re: Function to return both integer and SETOF refcursor