Batches of single-insert statements vs batches of multi-insert statements

From: Christopher Deckers <chrriis(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Batches of single-insert statements vs batches of multi-insert statements
Date: 2016-06-05 18:17:54
Message-ID: CADFnS4TP2tqF5qXDpvAJykdREoZtAQ9nJZNrN3Xh4MQxTfAv7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi all,

I tried all sorts of query rewrite to boost a huge sequence of insert
statements and here are my findings. All is run on a local Postgres 9.5.3,
on a 64-bit Windows box with 32GB RAM, i7(at)3(dot)6GHz(dot)

1. Batches of single insert statements (original case):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?)
-> 52 seconds

2. Batches of single insert statements to an unlogged table, then copy:
CREATE UNLOGGED TABLE SomeUnloggedTable AS SELECT * FROM SomeTable WHERE 1
= 0
Batch of: INSERT INTO SomeUnloggedTable (Col1, Col2) VALUES (?, ?)
INSERT INTO SomeTable SELECT * FROM SomeUnloggedTable
DROP TABLE SomeUnloggedTable
-> 43 seconds (15 seconds in the copy from unlogged to target table)

3. Batches of 100 multi-insert (followed by batches of single inserts for
remainder):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?), (?, ?), (?,
?)...
-> 30 seconds

So, rewriting the batch of single insert statements to batches of 100
multi-insert statements almost doubled the speed.

Could someone educate me as to what happens internally to the JDBC
single-insert batch that makes it so different from a multi-insert batch?

Rewriting our code to use multi-value inserts is more cumbersome and not
applicable everywhere. It also makes the code database-specific as all the
engines we support do not all handle this syntax.
These considerations actually lead me to this old thread:
https://www.postgresql.org/message-id/55130DC8.2070508%40redhat.com

Anyway, these tests definitely show that the performance of the standard
JDBC batch API could potentially be improved in Postgres internally with a
bit of magic :)
Isn't it possible (either in the driver or in the engine) to aggregate the
single-insert statements of a batch to turn them into a multi-insert
equivalent? It would not have to handle all cases: simply handling basic
commands like "INSERT INTO X (a, b, c, ...) VALUES (?, ?, ?, ...)" would
help a lot.
If this should not be done in the JDBC driver, then in which area could
such handling be done?
Or have I missed something?

If needed, I am ready to offer my assistance with testing (I can tweak
parameters, install dev drivers, modify my code to run alternate
statements, experiment, etc.).

Cheers,
-Christopher

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Vladimir Sitnikov 2016-06-05 18:33:46 Re: Batches of single-insert statements vs batches of multi-insert statements
Previous Message Vladimir Sitnikov 2016-06-01 11:23:29 Re: Re: [BUGS] BUG #14166: JDBC driver won't parse error message from DB server in German