From: | Frank Bax <fbax(at)sympatico(dot)ca> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: De-duplicating rows |
Date: | 2009-07-17 15:04:48 |
Message-ID: | 4A609310.4050003@sympatico.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Richard Huxton wrote:
> Christophe wrote:
>> Now, since this database has been production since 7.2 days, cruft has
>> crept in: in particular, there are duplicate email addresses, some
>> with mismatched attributes. The policy decision by the client is that
>> the correct row is the one with the earliest timestamp.
>
> Something like (untested):
>
> CREATE TEMPORARY TABLE earliest_duplicates AS
> SELECT
> email AS tgt_email,
> min(create_date) AS tgt_date
> FROM mytable
> GROUP BY email
> HAVING count(*) > 1;
>
> DELETE FROM mytable USING earliest duplicates
> WHERE email=tgt_email AND create_date > tgt_date;
If it is possible that two rows exist for the same email/date; then you
will likely need to deal with these manually.
If you rerun the above SELECT after running the delete you should
identify these rows.
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2009-07-17 15:32:55 | Re: De-duplicating rows |
Previous Message | Richard Huxton | 2009-07-17 07:21:01 | Re: De-duplicating rows |