Re: JDBC slow performance on resultSet iteration?

From: Edson Richter <edsonrichter(at)hotmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC slow performance on resultSet iteration?
Date: 2013-11-05 18:50:24
Message-ID: BLU0-SMTP19936C630AE4F366E25A5F3CFF10@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Em 05/11/2013 16:45, Kevin Wooten escreveu:
> Well yes and no. The driver may be transferring your data in binary
> already but we should work out if that is actually true. I believe
> that option "-Dorg.postgresql.forcebinary=boolean" still works and
> forces an Extended Query which will allow binary transfers.
>
> Try running with that option and see if it changes the timings.
>
> On Nov 5, 2013, at 11:27 AM, Rui Pedro Leal <rui(dot)pedro(dot)leal(at)gmail(dot)com
> <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>
>> Hello Kevin,
>>
>> That's exactly my immediate (and hopefully wrong) conclusion: the
>> getXXX calls are really costly!
>>
>> So, you are proposing using the binary protocol. I've never known the
>> existence of that. Is this it:
>> http://wiki.postgresql.org/wiki/JDBC-BinaryTransfer ?
>>
>> Thanks.
>>
>> [] Rui Pedro Leal
>> [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>
>> On 2013/11/05, at 18:14, Kevin Wooten wrote:
>>
>>> Well I can speak from experience...
>>>
>>> When I was comparing the performance of my implementation with
>>> pgjdbc I was very disheartened at the beginning as pgjdbc was
>>> seemingly much faster. It was only after I switched the test to
>>> take into account the getXXX calls that the true cost of using the
>>> driver was discovered. If you just execute a query and ignore the
>>> results pgjdbc is essentially just receiving a stream of text. It
>>> ignores all of the work of turning that text into Java
>>> objects/primitives.
>>>
>>> My implementation uses binary protocol for this reason and that
>>> removes a lot of the overhead that pgjdbc has. If pgAdmin is using
>>> the binary protocol as well this could very well be the difference
>>> that is being seen, because it is a significant amount of time.
>>>
>>> On Nov 5, 2013, at 11:03 AM, Dave Cramer <pg(at)fastcrypt(dot)com
>>> <mailto:pg(at)fastcrypt(dot)com>> wrote:
>>>
>>>> I do not know, the best way to figure this out is to use explain
>>>> analyze in pg_admin.
>>>>
>>>> What I am debating is the java code is using a cursor. pgAdmin is not.
>>>>
>>>> Try it without setFetchSize()
>>>>
>>>> Dave Cramer
>>>>
>>>> dave.cramer(at)credativ(dot)ca
>>>> http://www.credativ.ca <http://www.credativ.ca/>
>>>>
>>>>
>>>> On Tue, Nov 5, 2013 at 1:00 PM, Rui Pedro Leal
>>>> <rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>>>>
>>>> Dave,
>>>>
>>>> I understand what you are saying, but i must get the results,
>>>> right? :)
>>>>
>>>> I think in pgAdmin the total query time, presented by the query
>>>> execution windows, is also considering the rendering /
>>>> presenting of the query results, so this is comparable ... right?
>>>>
>>>> [] Rui Pedro Leal
>>>> [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>>>
>>>> On 2013/11/05, at 17:34, Dave Cramer wrote:
>>>>
>>>>> Rui Leal,
>>>>>
>>>>> You shouldn't see that much of a penatly, but I question how
>>>>> you are measuring it. Since you are timing the loop which is
>>>>> using cursors.
>>>>>
>>>>> Just run a simple query and time it without iterating through
>>>>> the result set. That would be equivalent
>>>>>
>>>>> Dave Cramer
>>>>>
>>>>> dave.cramer(at)credativ(dot)ca
>>>>> http://www.credativ.ca <http://www.credativ.ca/>
>>>>>
>>>>>
>>>>> On Tue, Nov 5, 2013 at 12:16 PM, Rui Pedro Leal
>>>>> <rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>>
>>>>> wrote:
>>>>>
>>>>> Hello Dave, thanks for the quick reply.
>>>>>
>>>>> I'm just executing the same query on pgAdmin query
>>>>> interface. Just a way to measure performance of the query
>>>>> + getting results vs the JDBC way.
>>>>>
>>>>> I'm using fetchsize in order to get some faster results
>>>>> when iterating the resultSet. I've tried not to use
>>>>> fetchSize/cursor but the performance is naturally slower
>>>>> (and worser than the simple pgAdmin query).
>>>>>
>>>>> What i'm asking is .. is this performance penalty i'm
>>>>> getting using postgres JDBC normal and within what's
>>>>> expected? I'm i doing something wrong? Can i do it in any
>>>>> other way?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Rui Leal
>>>>>
>>>>> [] Rui Pedro Leal
>>>>> [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>>>>
>>>>>
>>>>> On 2013/11/05, at 16:43, Dave Cramer wrote:
>>>>>
>>>>>> How can you measure the same query in pgAdmin ? JDBC will
>>>>>> use a cursor when you set the fetchsize.
>>>>>>
>>>>>> Dave Cramer
>>>>>>
>>>>>> dave.cramer(at)credativ(dot)ca
>>>>>> http://www.credativ.ca <http://www.credativ.ca/>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 5, 2013 at 11:12 AM, Rui Pedro Leal
>>>>>> <rui(dot)pedro(dot)leal(at)gmail(dot)com
>>>>>> <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>>>>>>
>>>>>> Just an quick info update: i'm using the
>>>>>> postgresql-9.2-1003-jdbc4 driver
>>>>>>
>>>>>> On 2013/11/05, at 16:09, Rui Pedro Leal wrote:
>>>>>>
>>>>>> > Hello everyone,
>>>>>> >
>>>>>> > I'm having slow JDBC performance on iterating a
>>>>>> resultSet obtained from a somewhat simple query.
>>>>>> >
>>>>>> > The query, although using PostGIS functions, is
>>>>>> pretty straight forward:
>>>>>> >
>>>>>> > SELECT _id, _spatial_type, ST_AsBinary(_geometry),
>>>>>> _attribute, _count, _references, countpersons,
>>>>>> countfatals, countdrunks, density
>>>>>> > FROM accidents_5
>>>>>> > WHERE (_geometry &&
>>>>>> ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603,
>>>>>> 4326) )
>>>>>> > ORDER BY _pk_id
>>>>>> >
>>>>>> > The results are the following:
>>>>>> > - Executing the query directly from pgAdmin: ~2807ms
>>>>>> > - Executing from JVM + JDBC: 4184ms
>>>>>> >
>>>>>> > The code i'm executing is pretty much standard:
>>>>>> >
>>>>>> > -------------------- code --------------------
>>>>>> >
>>>>>> > public static void main(final String[] args) throws
>>>>>> Exception {
>>>>>> >
>>>>>> > String sql = "SELECT _id, _spatial_type,
>>>>>> ST_AsBinary(_geometry), _attribute, _count,
>>>>>> _references, countpersons, countfatals, countdrunks,
>>>>>> density";
>>>>>> > sql += " FROM accidents_5";
>>>>>> > sql += " WHERE (_geometry &&
>>>>>> ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603,
>>>>>> 4326) )";
>>>>>> > sql += " ORDER BY _pk_id";
>>>>>> >
>>>>>> > System.out.println(sql);
>>>>>> >
>>>>>> > Long time = System.currentTimeMillis();
>>>>>> > try {
>>>>>> > Connection connection =
>>>>>> DataStoreInfo.getDataStores().get(0);
>>>>>> > connection.setAutoCommit(false);
>>>>>> >
>>>>>> > System.out.println("[QUERY " +
>>>>>> Thread.currentThread().getId()
>>>>>> > + "] - connection in "
>>>>>> > + (System.currentTimeMillis() - time) + "ms.");
>>>>>> >
>>>>>> > Statement st = connection.createStatement(
>>>>>> > ResultSet.TYPE_FORWARD_ONLY,
>>>>>> ResultSet.CONCUR_READ_ONLY,
>>>>>> > ResultSet.CLOSE_CURSORS_AT_COMMIT);
>>>>>> > st.setFetchSize(250);
>>>>>> >
>>>>>> > System.out.println("[QUERY " +
>>>>>> Thread.currentThread().getId()
>>>>>> > + "] - statement in " + (System.currentTimeMillis()
>>>>>> - time)
>>>>>> > + "ms.");
>>>>>> >
>>>>>> > ResultSet resultSet = st.executeQuery(sql);
>>>>>> >
>>>>>> > System.out.println("[QUERY " +
>>>>>> Thread.currentThread().getId()
>>>>>> > + "] - done in " + (System.currentTimeMillis() - time)
>>>>>> > + "ms.");
>>>>>> >
>>>>>> > GeometryFactory geofact = new GeometryFactory(new
>>>>>> PrecisionModel(),4326);
>>>>>> > WKBReader wkbReader = new WKBReader(geofact);
>>>>>> >
>>>>>> > String id;
>>>>>> > String spatialType;
>>>>>> > Geometry geometry;
>>>>>> > String attribute;
>>>>>> > Long count;
>>>>>> > String reference;
>>>>>> > Map<String, Double> properties;
>>>>>> > String granularSynthString = "GranularSynthesis";
>>>>>> >
>>>>>> > while (resultSet.next()) {
>>>>>> > id = resultSet.getString(1);
>>>>>> > spatialType = resultSet.getString(2);
>>>>>> >
>>>>>> > // geometry =
>>>>>> wkbReader.read(resultSet.getBytes(3)); // ignored the
>>>>>> WKBReader and the results are about the same.
>>>>>> > attribute = resultSet.getString(4);
>>>>>> > count = resultSet.getLong(5);
>>>>>> > reference = resultSet.getString(6);
>>>>>> >
>>>>>> > properties = new HashMap<String, Double>();
>>>>>> > Double aux = resultSet.getDouble(7);
>>>>>> > properties.put("countpersons", aux);
>>>>>> > aux = resultSet.getDouble(8);
>>>>>> > properties.put("countfatals", aux);
>>>>>> > aux = resultSet.getDouble(9);
>>>>>> > properties.put("countdrunks", aux);
>>>>>> > aux = resultSet.getDouble(10);
>>>>>> > properties.put("density", aux);
>>>>>> > }
>>>>>> >
>>>>>> > System.out.println("[QUERY " +
>>>>>> Thread.currentThread().getId()
>>>>>> > + "] - done & iterated in "
>>>>>> > + (System.currentTimeMillis() - time) + "ms.");
>>>>>> >
>>>>>> > resultSet.close();
>>>>>> > st.close();
>>>>>> > connection.commit();
>>>>>> > connection.close();
>>>>>> > } catch (SQLException exception) {
>>>>>> > exception.printStackTrace();
>>>>>> > // } catch (ParseException exception) { //
>>>>>> ignored from WKBreader
>>>>>> > // exception.printStackTrace();
>>>>>> > }
>>>>>> >
>>>>>> > System.out.println("[End " +
>>>>>> Thread.currentThread().getId()
>>>>>> > + "] - done in " +
>>>>>> (System.currentTimeMillis() - time) + "ms.");
>>>>>> > }
>>>>>> >
>>>>>> > -------------------- end code --------------------
>>>>>> >
>>>>>> > Although i'm executing this on slow MacbookPro (2.6
>>>>>> core duo, 2Gbs RAM but SSD) and have a 9.1.2
>>>>>> postgres, i've also tested this on a recent retina
>>>>>> MBP and the ratio between pgAdmin and JDBC execution
>>>>>> is similar.
>>>>>> >
>>>>>> > Is this expected? Can someone point if i'm doing
>>>>>> something terrible wrong?
>>>>>> >
>>>>>> > I'm not concerned about the query performance
>>>>>> per-se (i know it CAN be optimized), but the
>>>>>> differences just using JDBC and iterating the
>>>>>> resultSet are really annoying.
>>>>>> >
>>>>>> > Thanks in advance for any help.
>>>>>> >
>>>>>> > Kind regards,
>>>>>> >
>>>>>> > Rui Leal
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-jdbc mailing list
>>>>>> (pgsql-jdbc(at)postgresql(dot)org
>>>>>> <mailto:pgsql-jdbc(at)postgresql(dot)org>)
>>>>>> To make changes to your subscription:
>>>>>> http://www.postgresql.org/mailpref/pgsql-jdbc
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Does this patches already integrated into main stream driver source code?

Edson

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kevin Wooten 2013-11-05 18:55:27 Re: JDBC slow performance on resultSet iteration?
Previous Message Kevin Wooten 2013-11-05 18:45:55 Re: JDBC slow performance on resultSet iteration?