From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: COPY from .csv File and Remove Duplicates |
Date: | 2011-08-12 15:27:09 |
Message-ID: | alpine.LNX.2.00.1108120822030.25454@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, 11 Aug 2011, David Johnston wrote:
> If you have duplicates with matching real keys inserting into a staging
> table and then moving new records to the final table is your best option
> (in general it is better to do a two-step with a staging table since you
> can readily use Postgresql to perform any intermediate translations) As
> for the import itself,
It was probably a couple of days extracting very messy data from Excel
spreadsheets and writing python and awk scripts to transform them that
caused me to miss the obvious: the multi-column primary key that I intended
to implement in the base table.
Trying to add a compound primary key using (loc_name, sample_date, param)
shows there are duplicates in the original data. While there are many slight
variations on the SELECT syntax for finding duplicates based on a single
column, I've not found working syntax for finding duplicate rows based on
the values in all three columns.
A pointer to the appropriate syntax for retrieving the entire row when
count(loc_name, sample_date, param) > 1 would be much appreciated.
Rich
From | Date | Subject | |
---|---|---|---|
Next Message | George MacKerron | 2011-08-12 16:04:50 | Functions returning setof record -- can I use a table type as my return type hint? |
Previous Message | Adrian Klaver | 2011-08-12 14:29:27 | Re: How to convert integer to string in functions |