Re: Best strategy for bulk inserts where some violate unique constraint?

From: Denis Papathanasiou <denis(dot)papathanasiou(at)gmail(dot)com>
To:
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: Best strategy for bulk inserts where some violate unique constraint?
Date: 2013-11-06 14:50:04
Message-ID: CAEpnaGwf4DazeGy_tGTPZg7w9wixJvufDM0HJNkBLZ-zB2PHmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Tue, Nov 5, 2013 at 6:31 PM, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com
> wrote:

> As the last example do you mean the executemany() example?

Yes, I was referring to this code example:

> If you had the following:
>
> namedict = ({"first_name":"Joshua", "last_name":"Drake"},
> {"first_name":"Steven", "last_name":"Foo"},
> {"first_name":"David", "last_name":"Bar"})
>
> You could easily insert all three rows within the dictionary by using:
>
> cur = conn.cursor()
> cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES
(%(first_name)s, %(last_name)s)""", namedict)
>
> The cur.executemany statement will automatically iterate through the
dictionary and execute the INSERT query for each row.

What I wound up doing in my own code is this:

# process each {columns=values} dict
try:
cur.executemany("INSERT INTO ...", insert_dict)
except psycopg2.IntegrityError:
conn.commit()

Hoping that by catching the unique constraint error with a commit, it would
at least preserve the inserts from the dict that do not violate the table
constraints, but that is not how postgres works.

> That is not
> going to be much faster than repeated execute().
>

Really?

> The easiest thing you can do is to switch to autocommit=True and do
> repeated execute with insert. If one fail you can just ignore the
> IntegrityError and go on.
>

Ok, I'll try that, thanks.

About as easy you can prepare a statement and execute it repeatedly
> using PREPARE/EXECUTE: see
> <http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
> no builtin support for that in psycopg but you can just execute()
> these statements. It may save you something. You can also take a look
> at this example of a PREPAREing cursor:
> <https://gist.github.com/dvarrazzo/3797445>.
>

Thanks for the example.

> However, the fastest way to insert data into Postgres is COPY, see
> <http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
> have to present your data as a file. I can't remember what happens
> when a record fails the integrity test: I think the other would still
> be inserted but you will have to check. A much more robust strategy is
> to create a temporary table with the right schema but without
> constraints, load the data there using COPY and then move the data to
> the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
> ... and specify a condition to avoid the records that would fail the
> constraint.
>
> I would go for COPY, it's by far faster than execute[many], even
> including prepare.
>

I saw Adrian's reply about how this still won't do what I need in terms of
ignoring the constraint violators and keeping the valid entries; will
switching autocommit to True first have the desired effect?

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Denis Papathanasiou 2013-11-06 14:52:00 Re: Best strategy for bulk inserts where some violate unique constraint?
Previous Message Adrian Klaver 2013-11-05 23:38:58 Re: Best strategy for bulk inserts where some violate unique constraint?