Re: speed concerns with executemany()

From: Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, Christophe Pettus <xof(at)thebuild(dot)com>, mike bayer <mike_mp(at)zzzcomputing(dot)com>, "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: speed concerns with executemany()
Date: 2016-12-31 07:42:31
Message-ID: CANsFX04okXoU9W2ij7ujzyD0Pp+ONcbAEHO9CSuQORkkejAo=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

Since lists are fixed-arrays that grow in 2x size when they fill, it's
better to build a list in 1 go instead of appending to it (which makes it
full, and then create a new 2x bigger, copying there, appending etc etc
until it's full)
example:
`sqls = [self.mogrify(sql, args) for args in args]`
Even better make it a tuple instead of a list if it's immutable.

On Sat, Dec 31, 2016 at 12:55 AM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 12/30/2016 02:24 PM, Daniele Varrazzo wrote:
>
>> The implementation of executemany as described by me a few days ago is
>> available in this gist, not heavily tested:
>>
>> https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e
>>
>> I would like to know if anyone sees any shortcoming in this new
>> implementation.
>>
>
> A quick test. I added an argument to change the page_size on the command
> line:
>
> With NRECS=10000:
>
> aklaver(at)tito:~> python psycopg_executemany.py -p 10
> classic: 0.800544023514 sec
> joined: 0.514330863953 sec
> aklaver(at)tito:~> python psycopg_executemany.py -p 100
> classic: 0.780461072922 sec
> joined: 0.473304986954 sec
> aklaver(at)tito:~> python psycopg_executemany.py -p 1000
> classic: 0.820818901062 sec
> joined: 0.488647937775 sec
>
>
> With NRECS=100000:
>
> aklaver(at)tito:~> python psycopg_executemany.py -p 10
> classic: 7.78319811821 sec
> joined: 4.18683385849 sec
> aklaver(at)tito:~> python psycopg_executemany.py -p 100
> classic: 7.75992202759 sec
> joined: 4.06096816063 sec
> aklaver(at)tito:~> python psycopg_executemany.py -p 1000
> classic: 7.76269102097 sec
> joined: 4.12301802635 sec
>
>
> The relative difference between the classic and joined seems to hold, you
> just do not seem to get much benefit from changing the page_size. Not sure
> how much that matters and you do get a benefit from the joined solution.
>
>
>
>> -- Daniele
>>
>> On Sun, Dec 25, 2016 at 10:11 AM, Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>
>> wrote:
>>
>>> Sending stuff in big-batches + autocommit (fast transactions) + few
>>> network
>>> calls is performance 101 I thought. I think the "executemany" should be
>>> documented what it does (it looked suspicious when I saw it long time
>>> ago,
>>> why I didn't use it).
>>>
>>> On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <
>>> adrian(dot)klaver(at)aklaver(dot)com>
>>> wrote:
>>>
>>>>
>>>> On 12/23/2016 06:57 PM, Christophe Pettus wrote:
>>>>
>>>>>
>>>>>
>>>>> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
>>>>>> wrote:
>>>>>> Alright that I get. Still the practical outcome is each INSERT is
>>>>>> being
>>>>>> done in a transaction (an implicit one) so the transaction overhead
>>>>>> comes
>>>>>> into play. Or am I missing something?
>>>>>>
>>>>>
>>>>>
>>>>> Nope, not missing a thing. The theory (and it is only that) is that
>>>>> when
>>>>> they do the .executemany(), each of those INSERTs pays the transaction
>>>>> overhead, while if they do one big INSERT, just that one statement
>>>>> does.
>>>>>
>>>>
>>>>
>>>> Just ran a quick and dirty test using IPython %timeit.
>>>>
>>>> With a list of 200 tuples each which had 3 integers INSERTing into:
>>>> test=> \d psycopg_table
>>>> Table "public.psycopg_table"
>>>> Column | Type | Modifiers
>>>> --------+---------+-----------
>>>> a | integer |
>>>> b | integer |
>>>> c | integer |
>>>>
>>>>
>>>> The results where:
>>>>
>>>> sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)"
>>>>
>>>> Without autocommit:
>>>>
>>>> In [65]: timeit -n 10 cur.executemany(sql, l)
>>>> 10 loops, best of 3: 12.5 ms per loop
>>>>
>>>>
>>>> With autocommit:
>>>>
>>>> In [72]: timeit -n 10 cur.executemany(sql, l)
>>>> 10 loops, best of 3: 1.71 s per loop
>>>>
>>>>
>>>>
>>>>> --
>>>>> -- Christophe Pettus
>>>>> xof(at)thebuild(dot)com
>>>>>
>>>>>
>>>>
>>>> --
>>>> Adrian Klaver
>>>> adrian(dot)klaver(at)aklaver(dot)com
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via psycopg mailing list (psycopg(at)postgresql(dot)org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/psycopg
>>>>
>>>
>>>
>>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse psycopg by date

  From Date Subject
Next Message Daniele Varrazzo 2017-01-01 08:11:24 Solving the SQL composition problem
Previous Message Adrian Klaver 2016-12-30 23:55:32 Re: speed concerns with executemany()