From: | Daniel Begin <jfd553(at)hotmail(dot)com> |
---|---|
To: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Removing duplicate records from a bulk upload |
Date: | 2014-12-08 03:31:36 |
Message-ID: | COL129-DS2247F5657A1B6114A3819194640@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before applying constraints.
I understand there are (at least) two approaches to get a table without duplicate records…
- Delete duplicate records from the table based on an appropriate select clause;
- Create a new table with the results from a select distinct clause, and then drop the original table.
What would be the most efficient procedure in PostgreSQL to do the job considering …
- I do not know which records were duplicated;
- There are no indexes applied on tables yet;
- There is no OIDS on tables yet;
- The database is currently 1TB but I have plenty of disk space.
Daniel
From | Date | Subject | |
---|---|---|---|
Next Message | Eric Svenson | 2014-12-08 08:21:28 | Re: Fwd: Fwd: Problem with pg_dump and decimal mark |
Previous Message | David G Johnston | 2014-12-08 02:18:11 | Re: FW: SQL rolling window without aggregation |