| From: | Frank Bax <fbax(at)sympatico(dot)ca> |
|---|---|
| To: | pgsql-sql(at)postgresql(dot)org |
| Subject: | Re: De-duplicating rows |
| Date: | 2009-07-17 15:04:48 |
| Message-ID: | 4A609310.4050003@sympatico.ca |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-sql |
Richard Huxton wrote:
> Christophe wrote:
>> Now, since this database has been production since 7.2 days, cruft has
>> crept in: in particular, there are duplicate email addresses, some
>> with mismatched attributes. The policy decision by the client is that
>> the correct row is the one with the earliest timestamp.
>
> Something like (untested):
>
> CREATE TEMPORARY TABLE earliest_duplicates AS
> SELECT
> email AS tgt_email,
> min(create_date) AS tgt_date
> FROM mytable
> GROUP BY email
> HAVING count(*) > 1;
>
> DELETE FROM mytable USING earliest duplicates
> WHERE email=tgt_email AND create_date > tgt_date;
If it is possible that two rows exist for the same email/date; then you
will likely need to deal with these manually.
If you rerun the above SELECT after running the delete you should
identify these rows.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Scott Marlowe | 2009-07-17 15:32:55 | Re: De-duplicating rows |
| Previous Message | Richard Huxton | 2009-07-17 07:21:01 | Re: De-duplicating rows |