Re: Best strategy for bulk inserts where some violate unique constraint?

From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, Denis Papathanasiou <denis(dot)papathanasiou(at)gmail(dot)com>
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: Best strategy for bulk inserts where some violate unique constraint?
Date: 2013-11-05 23:38:58
Message-ID: 52798192.7040000@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 11/05/2013 03:31 PM, Daniele Varrazzo wrote:

>
> As the last example do you mean the executemany() example? That is not
> going to be much faster than repeated execute().
>
> The easiest thing you can do is to switch to autocommit=True and do
> repeated execute with insert. If one fail you can just ignore the
> IntegrityError and go on.
>
> About as easy you can prepare a statement and execute it repeatedly
> using PREPARE/EXECUTE: see
> <http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
> no builtin support for that in psycopg but you can just execute()
> these statements. It may save you something. You can also take a look
> at this example of a PREPAREing cursor:
> <https://gist.github.com/dvarrazzo/3797445>.
>
> However, the fastest way to insert data into Postgres is COPY, see
> <http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
> have to present your data as a file. I can't remember what happens
> when a record fails the integrity test: I think the other would still
> be inserted but you will have to check.

It will fail.

http://www.postgresql.org/docs/9.3/interactive/sql-copy.html

COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space.

I would suggest checking out pg_loader it is designed to deal with this
scenario.

A much more robust strategy is
> to create a temporary table with the right schema but without
> constraints, load the data there using COPY and then move the data to
> the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
> ... and specify a condition to avoid the records that would fail the
> constraint.
>
> I would go for COPY, it's by far faster than execute[many], even
> including prepare.
>
> -- Daniele
>
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Denis Papathanasiou 2013-11-06 14:50:04 Re: Best strategy for bulk inserts where some violate unique constraint?
Previous Message Daniele Varrazzo 2013-11-05 23:31:13 Re: Best strategy for bulk inserts where some violate unique constraint?