From: | Kevin Brannen <kevinb(at)nurseamerica(dot)net> |
---|---|
To: | Jeremy Cowgar <develop(at)cowgar(dot)com> |
Cc: | postgres list <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to get rid of dups... |
Date: | 2002-07-11 16:56:15 |
Message-ID: | 3D2DB8AF.30901@nurseamerica.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Jeremy Cowgar wrote:
> I need to get rid of all rows that have dups in the columns
> tpa,pun,grn,claim ... i.e.
>
> 1--- 001 001 001 00-000001 John Doe
> 2--- 001 001 001 00-000001 Jane Doe
> 3--- 001 002 001 00-000001 John Doe
>
> 1 and 2 would be dups, 1 and 3 are diff records, 2 and 3 are diff
> records.
>
> I tried this as a test:
>
> select count(claimid), tpa, pun, grn, claim FROM claim_import GROUP BY
> tpa, pun, grn, claim HAVING count(claimid) > 1;
> 26 rows returned.
>
> then
>
> select distinct on (tpa,pun,grn,claim) count(claimid), tpa, pun, grn,
> claim FROM claim_import GROUP BY tpa, pun, grn, claim HAVING
> count(claimid) > 1;
It's not obvious to me what your key(s) is (all 3 columns?), but this is
a place where self-joins are useful. Assuming a table like:
create table stuff (
id int, -- primary table key
value int, -- unique data key
...);
You should be able to find the dups with something like:
select b.id
from stuff a, stuff b
where a.value = b.value
and a.id < b.id;
Given that, then use it to get:
delete from stuff
where id in (select b.id from stuff a, stuff b where ...);
Be careful and experiment with the select until you're 110% sure you
like what you see. :-) Adapt this approach to your real table and you
should be set.
HTH,
Kevin
From | Date | Subject | |
---|---|---|---|
Next Message | Joo Paulo Batistella | 2002-07-11 17:18:07 | Type TEXT |
Previous Message | Joe Conway | 2002-07-11 16:50:07 | Re: Linux max on shared buffers? |