Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)

From: "J(dot) W(dot) Ulbts" <magog001(at)web(dot)de>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)
Date: 2009-07-18 13:51:31
Message-ID: 828427796@web.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hello!

It would be great if someone would add a parameter and code to the driver which would allow the JDBC driver to group a list of INSERT INTO added to a batch (Prepared Statement and the addBatch(..) method).
This way the INSERT INTO from the batch could be rewritten in a single INSERT INTO which is at least 3 times faster compared to the current behaviour. At least that was the result of my tests.

To make clear what I mean a simple example.

The normal batch would execute this:
--------------------
INSERT INTO sometable (col1, col2) VALUES (1, 'word1');
INSERT INTO sometable (col1, col2) VALUES (2, 'word2');
INSERT INTO sometable (col1, col2) VALUES (3, 'word3');
INSERT INTO sometable (col1, col2) VALUES (4, 'word4');
--------------------

If the batch insert is rewritten by the driver it would generate and execute this:
--------------------
INSERT INTO sometable (col1, col2) VALUES (1, 'word1'), (2, 'word2'), (3, 'word3'), (4, 'word4');
--------------------

If this is already possible with the PostgreSQL JDBC driver I must have missed it.

I ran into this by one of my users pointing me to the multi value/grouped insert statement, and then I finally saw that Mark Matthews has just improved the MySQL JDBC driver in this area. He's talking about 10x performance increase but that's basically because the MySQL server isn't optimizing as good as PostgreSQL. ;)
I could "only" see an improvement by factor 5 with my test data and MySQL InnoDB (IMDb - Keywords.list; reading data from file creating INSERT INTO all in one transaction). It's possible that MyISAM are up to 10x faster.

In my tests where I added the values for the INSERT INTO statement into two Lists (ArrayList; one for the key column and one for the data column) and generated the "INSERT INTO sometable (col1, col2) VALUES (x,y)[, (x,y)];" in my Java code it was 3 times faster compared to the normal PreparedStatement addBatch with it's batchExecute for PostgreSQL.
I could get the same speed improvement with my manually generated INSERT INTO as with the new JDBC driver for MySQL where I turned on the "rewriteBatchedStatements" parameter.
I tried column groups (the "(x,y)") with sizes between 250 and 5000 (in steps of 250: so 250, 500, 750,...,5000). The speed was pretty stable at 3x for PostgreSQL but the larger the group the more memory is used to store the data you are trying to insert.

Here is the blog entry of Mark Matthews (MySQL Connector/J developer) about speeding up Batch Inserts:
http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for

The upcoming MySQL driver supporting this feature (Connector/J 5.1 GA and 6.0 Alpha) can be downloaded here:
http://downloads.mysql.com/snapshots.php

Currently the MySQL driver has an error if your prepared INSERT INTO statement ends with a ";" (which is not needed except in a SQL script but several developers like myself normally also add this to the SQL Command in the Java code).
The driver would create an INSERT INTO statment that looks like this:
INSERT INTO sometable1 (col1, col2) VALUES (1, 'word');, (2, 'word2'), ...

It would be great if someone could implement this into the JDBC driver and the grouping could be turned on by a URL driver parameter like e.g MySQL Connector/J (?rewriteBatchedStatements=true).
Maybe there are also comment about this. So what do you think?

--
Juergen
______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2009-07-18 14:25:20 Re: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)
Previous Message John R Pierce 2009-07-17 22:29:35 Re: java 1.4 for how long ?