Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: Daniel Begin <jfd553(at)hotmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Date: 2014-12-09 02:35:24
Message-ID: CAOR=d=1jF7t1LKnAknrpSnXr_jF-MvVv6M0mT3paWdRob+5z_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

If you're de-duping a whole table, no need to create indexes, as it's
gonna have to hit every row anyway. Fastest way I've found has been:

select a,b,c into newtable from oldtable group by a,b,c;

On pass, done.

If you want to use less than the whole row, you can use select
distinct on (col1, col2) * into newtable from oldtable;

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-12-09 02:52:24 Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Previous Message Andy Colson 2014-12-09 01:22:59 Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)