Quick Links

Re: COPY from .csv File and Remove Duplicates

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Rich Shepard <rshepard(at)appl-ecosys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: COPY from .csv File and Remove Duplicates
Date:	2011-08-12 02:14:39
Message-ID:	4E448C8F.3070109@ringerc.id.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 12/08/2011 7:13 AM, Rich Shepard wrote:

> I have a .csv file of approximately 10k rows to copy into this table. My
> two questions which have not been answered by reference to my postgres
> reference book or Google searches are:
>
> 1) Will the sequence automatically add the nextval() to each new record as
> the copy command runs?

No, AFAIK COPY input needs to match the table structure and can't have
default fields etc.

> 2) Many of these rows almost certainly are already in the table. I would
> like to remove duplicates either during the COPY command or immediately
> after. I'm considering copying the new data into a clone of the table then
> running a SELECT to add only those rows in the new cloned table to the
> existing table.

Rather than removing them after, I'd recommend COPYing into a temporary
staging table, then doing an

INSERT INTO realtable SELECT FROM stagingtable LEFT OUTER JOIN realtable
ON (conditions) WHERE realtable.primarykey IS NULL;

... where "conditions" are whatever rules you use to decide that a row
in the real table is the same as a row in the staging table.

In other words: Only insert a row into the final destination table if it
does not already exist in the final destination table.

--
Craig Ringer

In response to

COPY from .csv File and Remove Duplicates at 2011-08-11 23:13:29 from Rich Shepard

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Johnston	2011-08-12 02:32:37	Re: COPY from .csv File and Remove Duplicates
Previous Message	Ondrej Ivanič	2011-08-12 00:30:08	Re: Postgres on SSD