From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: How batch processing works |
Date: | 2024-09-21 19:15:44 |
Message-ID: | 2fe0be15-f64e-465c-9dd2-b55c559ac7e2@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 9/21/24 07:36, Peter J. Holzer wrote:
> On 2024-09-21 16:44:08 +0530, Lok P wrote:
> ---------------------------------------------------------------------------------------------------
> #!/usr/bin/python3
>
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
> csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
> if i % batch_size == 0:
> db.commit()
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: {elapsed_time:.3} seconds")
>
> # vim: tw=99
> ---------------------------------------------------------------------------------------------------
FYI, this is less of problem with psycopg(3) and pipeline mode:
import time
import psycopg
num_inserts = 10_000
batch_size = 50
db = psycopg.connect("dbname=test user=postgres host=104.237.158.68")
csr = db.cursor()
csr.execute("drop table if exists parent_table")
csr.execute("create table parent_table (id int primary key, t text)")
db.commit()
start_time = time.monotonic()
with db.pipeline():
for i in range(1, num_inserts+1):
csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
if i % batch_size == 0:
db.commit()
db.commit()
end_time = time.monotonic()
elapsed_time = end_time - start_time
print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit
after {batch_size} Rows: {elapsed_time:.3} seconds")
For remote to a database in another state that took the time from:
Method 2: Individual Inserts with Commit after 50 Rows: 2.42e+02 seconds
to:
Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after
50 Rows: 9.83 seconds
> #!/usr/bin/python3
>
> import itertools
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> batch = []
> for i in range(1, num_inserts+1):
> batch.append((i, 'a'))
> if i % batch_size == 0:
> q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
> params = list(itertools.chain.from_iterable(batch))
> csr.execute(q, params)
> db.commit()
> batch = []
> if batch:
> q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
> csr.execute(q, list(itertools.chain(batch)))
> db.commit()
> batch = []
>
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each batch: {elapsed_time:.3} seconds")
>
> # vim: tw=99
> ---------------------------------------------------------------------------------------------------
The above can also be handled with execute_batch() and execute_values()
from:
https://www.psycopg.org/docs/extras.html#fast-execution-helpers
>
> On my laptop, method2 is about twice as fast as method3. But if I
> connect to a database on the other side of the city, method2 is now more
> than 16 times faster than method3 . Simply because the delay in
> communication is now large compared to the time it takes to insert those
> rows.
>
> hp
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Foerster | 2024-09-21 19:19:32 | Re: glibc updarte 2.31 to 2.38 |
Previous Message | Ramakrishna m | 2024-09-21 19:08:02 | Logical Replication Delay |