Re: SQL for Deleting all duplicate entries

From: Håkan Jacobsson <hakan(dot)jacobsson99(at)bredband(dot)net>
To: <mmoncure(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: SQL for Deleting all duplicate entries
Date: 2007-09-09 12:28:51
Message-ID: 17416430.759551189340931591.JavaMail.root@ps2.bredband.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Merlin,

Its just about three columns - not any column. Two columns are
varchars and the third is
a date. The date column value is NULL for the rows for which
I want to delete the duplicates.

Yes, please, be a bit more specific!

/regards, Håkan Jacobsson

>----Ursprungligt meddelande----
>Från: mmoncure(at)gmail(dot)com
>Datum: 06-09-2007 01:56
>Till: "Håkan Jacobsson"<hakan(dot)jacobsson99(at)bredband(dot)net>
>Kopia: <pgsql-general(at)postgresql(dot)org>
>Ärende: Re: [GENERAL] SQL for Deleting all duplicate entries
>
>On 9/5/07, Håkan Jacobsson <hakan(dot)jacobsson99(at)bredband(dot)net>
wrote:
>> Hi,
>>
>> I want to create a DELETE statement which deletes
duplicates
>> in a table.
>>
>> That is, I want to remove all rows - but one - having three
>> columns with the same data (more columns exist and there
the
>> data varies).
>> For example:
>> column1
>> column2
>> column3
>> column4
>> column5
>>
>> column2 = 'test', column3 = 'hey' and column4 IS NULL for
>> several rows in the table. I want to keep just one of those
>> rows.
>>
>> Is this possible? I can't figure it out, so any help MUCH
>> appreciated!
>
>when removing duplicates, I find it is usually better to look
at this
>problem backwards...you want to select out the data you want
to keep,
>truncate the original table, and insert select the data back
in.
>
>What isn't exactly clear from your question is if you are
interested
>in only particular fields or if you want to throw out based
on any
>columns (nut just 2, 3, and 4). If so, this is a highly
irregular
>(and interesting) problem, and should prove difficult to make
>efficient.
>
>If you are only interested in three particular columns, then
it's easy.
>1. select out data you want to keep using create table
scratch SELECT
>DISTINCT ON or GROUP BY into scratch
>2. truncate main table
>3. insert into main select * from scratch
>
>for a more specific answer, you will have to provide some
more detail,
>especially regarding exactly how you determine two rows as
being
>'duplicates'.
>
>merlin
>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2007-09-09 13:42:05 Re: SQL for Deleting all duplicate entries
Previous Message Filip Rembiałkowski 2007-09-09 11:01:52 Re: work hour calculations