Re: speed concerns with executemany()

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Federico Di Gregorio <fog(at)dndg(dot)it>
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: speed concerns with executemany()
Date: 2017-01-05 19:00:28
Message-ID: CA+mi_8bVHi_wkZBhDS-Wib9-5kkLjFR09d7BHtHBO7-MStcX=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog(at)dndg(dot)it> wrote:
> On 02/01/17 17:07, Daniele Varrazzo wrote:
>>
>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
>> wrote:
>>>
>>> With NRECS=10000 and page size=100:
>>>
>>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>>> classic: 427.618795156 sec
>>> joined: 7.55754685402 sec
>>
>> Ugh! :D
>
>
> That's great. Just a minor point: I won't overload executemany() with this
> feature but add a new method UNLESS the semantics are exactly the same
> especially regarding session isolation. Also, right now psycopg keeps track
> of the number of affected rows over executemany() calls: I'd like to not
> lose that because it is a breaking change to the API.

It seems to me that the semantics would stay the same, even in
presence of volatile functions. However unfortunately rowcount would
break. That's just sad.

We can have no problem an extra argument to executemany: page_size
defaulting to 1 (previous behaviour) which could be bumped. It's sad
the default cannot be 100.

Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491)
that SQLAlchemy actually uses the aggregated rowcount for concurrency
control.

So, how much it is of a deal-breaker? Can we afford losing aggregated
rowcount to obtain a juicy speedup in default usage, or we'd rather
leave the behaviour untouched but having people "opting in for speed"?

ponder, ponder...

Pondered: as the features had little test and I don't want to delay
releasing 2.7 further, I'd rather release the feature with a page_size
default of 1. People could use it and report eventual failures if they
use a page_size > 1. If tests turn out to be positive that the
database behaves ok we could think about changing the default in the
future. We may want to drop the aggregated rowcount in the future but
with better planning, e.g. to allow SQLAlchemy to ignore aggregated
rowcount from psycopg >= 2.8...

How does it sound?

-- Daniele

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Daniele Varrazzo 2017-01-05 19:26:35 Re: Solving the SQL composition problem
Previous Message Adrian Klaver 2017-01-05 18:59:54 Re: Solving the SQL composition problem