Re: COPY from .csv File and Remove Duplicates

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: 'Rich Shepard' <rshepard(at)appl-ecosys(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: COPY from .csv File and Remove Duplicates
Date: 2011-08-12 03:04:20
Message-ID: 4E449834.1020405@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/08/2011 10:32 AM, David Johnston wrote:

> The general structure for the insert would be:
>
> INSERT INTO maintable (cols)
> SELECT cols FROM staging WHERE staging.idcols NOT IN (SELECT
> maintable.idcols FROM maintable);
>
> There may be more efficient ways to write the query but the idea is the
> same.

Yeah... I'd favour an EXISTS test or a join.

INSERT INTO maintable (cols)
SELECT cols FROM staging WHERE NOT EXISTS (SELECT
1 FROM maintable WHERE maintable.idcol = staging.idcol);

... as the NOT IN(...) test can have less than lovely behavior for large
key sets.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2011-08-12 04:57:15 Re: Postgres on SSD
Previous Message David Johnston 2011-08-12 02:32:37 Re: COPY from .csv File and Remove Duplicates