Quick Links

Re: COPY from .csv File and Remove Duplicates

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	David Johnston <polobo(at)yahoo(dot)com>
Cc:	'Rich Shepard' <rshepard(at)appl-ecosys(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: COPY from .csv File and Remove Duplicates
Date:	2011-08-12 03:04:20
Message-ID:	4E449834.1020405@ringerc.id.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 12/08/2011 10:32 AM, David Johnston wrote:

> The general structure for the insert would be:
>
> INSERT INTO maintable (cols)
> SELECT cols FROM staging WHERE staging.idcols NOT IN (SELECT
> maintable.idcols FROM maintable);
>
> There may be more efficient ways to write the query but the idea is the
> same.

Yeah... I'd favour an EXISTS test or a join.

INSERT INTO maintable (cols)
SELECT cols FROM staging WHERE NOT EXISTS (SELECT
1 FROM maintable WHERE maintable.idcol = staging.idcol);

... as the NOT IN(...) test can have less than lovely behavior for large
key sets.

--
Craig Ringer

In response to

Re: COPY from .csv File and Remove Duplicates at 2011-08-12 02:32:37 from David Johnston

Browse pgsql-general by date

	From	Date	Subject
Next Message	Greg Smith	2011-08-12 04:57:15	Re: Postgres on SSD
Previous Message	David Johnston	2011-08-12 02:32:37	Re: COPY from .csv File and Remove Duplicates