Quick Links

Re: How to idenity duplicate rows

From:	"Jan Cruz" <malebug(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: How to idenity duplicate rows
Date:	2006-03-20 02:08:09
Message-ID:	493da2780603191808q4759ee9dm95f6a5fb712992f3@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

>
> As for deleting all but one row in a duplicated group, you're going to
> have to get at them by the oid or ctid columns perhaps.
>
> The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
> oldtable;.

I believe getting oid and/or ctid is not possible since it would not
display/get duplicate records
in a "HAVING CLAUSE" since their oid/ctid wouldn't be the same.

And creating a newtable and use SELECT DISTINCT isn't an acceptable idea to
the audit team when you have millions of records in a table in production
server when you only need to remove let say 70 records from that table.

I tried another approach where I queried and insert the duplicate record (35
records) into
a new/temporary table. Then I created a stored procedure something like
this:
--START

DECLARE
foo table;

BEGIN
FOR foo IN
SELECT * FROM newtable
LOOP
DELETE FROM oldtable
where oldtable.field1 = foo.field1
and oldtable.field2 = foo.field2
and oldtable.field3 = foo.field3
and oldtable.field4 = foo.field4
...';
END LOOP;
END;

-- END

Problem with this approach I got 35 duplicate records (count = 2)
from the new table and delete only 20 records from the oldtable.
If I am not mistaken it should delete 70 records.

I wonder

In response to

Re: How to idenity duplicate rows at 2006-03-19 22:52:07 from Peter Eisentraut

Browse pgsql-general by date

	From	Date	Subject
Next Message	Chris	2006-03-20 05:49:41	Re: db sever seems to be dropping connections
Previous Message	Berend Tober	2006-03-20 02:01:42	Re: How to idenity duplicate rows