From: | "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> |
---|---|
To: | "CSN" <cool_screen_name90001(at)yahoo(dot)com> |
Cc: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Finding (and deleting) dupes in relation table |
Date: | 2005-09-23 09:59:15 |
Message-ID: | 2266D0630E43BB4290742247C89105750813EC8D@dozer.computec.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello!
CSN [cool_screen_name90001(at)yahoo(dot)com] wrote:
> I have a table that relates id's of two other tables:
>
> table1id, table2id
>
> Dupes have found their way into it (create unique index
> across both fields fails). Is there a quick and easy way to
> find and delete the dupes (there are tens of thousands of records)?
>
> Thanks,
> CSN
If your table was created WITH OIDS you could identify the duplicates
thus:
select a.table1id
, a.table12d
, max(a.oid) as maxoid
, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id;
If you wish to delete surplus rows, you might do the following:
delete from schema.mytable where oid in (
select maxoid from (
select a.table1id, a.table12d, max(a.oid) as
maxoid, count(a.oid) as coid
from schema.mytable a,
schema.mytable b
where a.table1id = b.table1id
and a.table2id=b.table2id
and a.oid <> b.oid
group by a.table1id, a.table2id
order by a.table1id ) as foo
where coid >1 );
This will delete the oldest tuple of a duplicate set of rows; if there
are more than two tuples in a set, you'll want to execute this a couple
of times until there's no duplicate left, as the delete will only reduce
a set by one tuple at a time. I'd also recommend to apply a PRIMARY KEY
constraint afterwards instead of just a unique index - this will prevent
NULL-entries as well as creating the desired unique index - and I think
it's good practice to have a primary key on about every table there is,
except when it's just a junk data table like a logging table where
content is regularly evaluated and discarded.
Kind regards
Markus
From | Date | Subject | |
---|---|---|---|
Next Message | Nicolas Barbier | 2005-09-23 10:26:35 | Re: Lines of code in PG 8.0 & 8.1? |
Previous Message | Hannes Dorbath | 2005-09-23 07:41:08 | Re: Data Entry Tool for PostgreSQL |