From: | David Johnston <polobo(at)yahoo(dot)com> |
---|---|
To: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: COPY from .csv File and Remove Duplicates |
Date: | 2011-08-11 23:50:19 |
Message-ID: | 6A1FF13C-7526-4296-A132-B4F4988F5E17@yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Aug 11, 2011, at 19:13, Rich Shepard <rshepard(at)appl-ecosys(dot)com> wrote:
> A table has a sequence to generate a primary key for inserted records with
> NULLs in that column.
>
> I have a .csv file of approximately 10k rows to copy into this table. My
> two questions which have not been answered by reference to my postgres
> reference book or Google searches are:
>
> 1) Will the sequence automatically add the nextval() to each new record as
> the copy command runs?
>
> 2) Many of these rows almost certainly are already in the table. I would
> like to remove duplicates either during the COPY command or immediately
> after. I'm considering copying the new data into a clone of the table then
> running a SELECT to add only those rows in the new cloned table to the
> existing table.
>
> Suggestions on getting these data in without duplicating existing rows
> will be very helpful.
>
If you have duplicates with matching real keys inserting into a staging table and then moving new records to the final table is your best option (in general it is better to do a two-step with a staging table since you can readily use Postgresql to perform any intermediate translations) As for the import itself, I believe if the column with the sequence is present in the csv the sequence will not be used and, if no value is present, a null will be stored for that column - causing any not-null constraint to throw an error. In this case I would just import the data to a staging table without any kind of artificial key, just the true key, and then during the merge with the live table you simply omit the pk field from the insert statement and the sequence will kick in at that point.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-08-11 23:54:37 | Re: Regex Query Index question |
Previous Message | David Johnston | 2011-08-11 23:35:43 | Re: Regex Query Index question |