Quick Links

Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Andy Colson <andy(at)squeakycode(dot)net>
Cc:	Daniel Begin <jfd553(at)hotmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Date:	2014-12-09 02:35:24
Message-ID:	CAOR=d=1jF7t1LKnAknrpSnXr_jF-MvVv6M0mT3paWdRob+5z_A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

If you're de-duping a whole table, no need to create indexes, as it's
gonna have to hit every row anyway. Fastest way I've found has been:

select a,b,c into newtable from oldtable group by a,b,c;

On pass, done.

If you want to use less than the whole row, you can use select
distinct on (col1, col2) * into newtable from oldtable;

	From	Date	Subject
Next Message	Tom Lane	2014-12-09 02:52:24	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Previous Message	Andy Colson	2014-12-09 01:22:59	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)