From: | Denis Papathanasiou <denis(dot)papathanasiou(at)gmail(dot)com> |
---|---|
To: | |
Cc: | "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org> |
Subject: | Re: Best strategy for bulk inserts where some violate unique constraint? |
Date: | 2013-11-06 14:50:04 |
Message-ID: | CAEpnaGwf4DazeGy_tGTPZg7w9wixJvufDM0HJNkBLZ-zB2PHmw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | psycopg |
On Tue, Nov 5, 2013 at 6:31 PM, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com
> wrote:
> As the last example do you mean the executemany() example?
Yes, I was referring to this code example:
> If you had the following:
>
> namedict = ({"first_name":"Joshua", "last_name":"Drake"},
> {"first_name":"Steven", "last_name":"Foo"},
> {"first_name":"David", "last_name":"Bar"})
>
> You could easily insert all three rows within the dictionary by using:
>
> cur = conn.cursor()
> cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES
(%(first_name)s, %(last_name)s)""", namedict)
>
> The cur.executemany statement will automatically iterate through the
dictionary and execute the INSERT query for each row.
What I wound up doing in my own code is this:
# process each {columns=values} dict
try:
cur.executemany("INSERT INTO ...", insert_dict)
except psycopg2.IntegrityError:
conn.commit()
Hoping that by catching the unique constraint error with a commit, it would
at least preserve the inserts from the dict that do not violate the table
constraints, but that is not how postgres works.
> That is not
> going to be much faster than repeated execute().
>
Really?
> The easiest thing you can do is to switch to autocommit=True and do
> repeated execute with insert. If one fail you can just ignore the
> IntegrityError and go on.
>
Ok, I'll try that, thanks.
About as easy you can prepare a statement and execute it repeatedly
> using PREPARE/EXECUTE: see
> <http://www.postgresql.org/docs/9.2/static/sql-prepare.html> There is
> no builtin support for that in psycopg but you can just execute()
> these statements. It may save you something. You can also take a look
> at this example of a PREPAREing cursor:
> <https://gist.github.com/dvarrazzo/3797445>.
>
Thanks for the example.
> However, the fastest way to insert data into Postgres is COPY, see
> <http://initd.org/psycopg/docs/cursor.html#cursor.copy_from>. You will
> have to present your data as a file. I can't remember what happens
> when a record fails the integrity test: I think the other would still
> be inserted but you will have to check. A much more robust strategy is
> to create a temporary table with the right schema but without
> constraints, load the data there using COPY and then move the data to
> the final table using INSERT INTO ... SELECT * FROM temp_table WHERE
> ... and specify a condition to avoid the records that would fail the
> constraint.
>
> I would go for COPY, it's by far faster than execute[many], even
> including prepare.
>
I saw Adrian's reply about how this still won't do what I need in terms of
ignoring the constraint violators and keeping the valid entries; will
switching autocommit to True first have the desired effect?
From | Date | Subject | |
---|---|---|---|
Next Message | Denis Papathanasiou | 2013-11-06 14:52:00 | Re: Best strategy for bulk inserts where some violate unique constraint? |
Previous Message | Adrian Klaver | 2013-11-05 23:38:58 | Re: Best strategy for bulk inserts where some violate unique constraint? |