Quick Links

Re: De-duplicating rows

From:	Frank Bax <fbax(at)sympatico(dot)ca>
To:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: De-duplicating rows
Date:	2009-07-17 15:04:48
Message-ID:	4A609310.4050003@sympatico.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Richard Huxton wrote:
> Christophe wrote:
>> Now, since this database has been production since 7.2 days, cruft has
>> crept in: in particular, there are duplicate email addresses, some
>> with mismatched attributes. The policy decision by the client is that
>> the correct row is the one with the earliest timestamp.
>
> Something like (untested):
>
> CREATE TEMPORARY TABLE earliest_duplicates AS
> SELECT
> email AS tgt_email,
> min(create_date) AS tgt_date
> FROM mytable
> GROUP BY email
> HAVING count(*) > 1;
>
> DELETE FROM mytable USING earliest duplicates
> WHERE email=tgt_email AND create_date > tgt_date;

If it is possible that two rows exist for the same email/date; then you
will likely need to deal with these manually.

If you rerun the above SELECT after running the delete you should
identify these rows.

In response to

Re: De-duplicating rows at 2009-07-17 07:21:01 from Richard Huxton

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Scott Marlowe	2009-07-17 15:32:55	Re: De-duplicating rows
Previous Message	Richard Huxton	2009-07-17 07:21:01	Re: De-duplicating rows