| From: | Edson Richter <edsonrichter(at)hotmail(dot)com> | 
|---|---|
| To: | pgsql-jdbc(at)postgresql(dot)org | 
| Subject: | Re: JDBC slow performance on resultSet iteration? | 
| Date: | 2013-11-05 18:50:24 | 
| Message-ID: | BLU0-SMTP19936C630AE4F366E25A5F3CFF10@phx.gbl | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-jdbc | 
Em 05/11/2013 16:45, Kevin Wooten escreveu:
> Well yes and no.  The driver may be transferring your data in binary 
> already but we should work out if that is actually true.  I believe 
> that option "-Dorg.postgresql.forcebinary=boolean" still works and 
> forces an Extended Query which will allow binary transfers.
>
> Try running with that option and see if it changes the timings.
>
> On Nov 5, 2013, at 11:27 AM, Rui Pedro Leal <rui(dot)pedro(dot)leal(at)gmail(dot)com 
> <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>
>> Hello Kevin,
>>
>> That's exactly my immediate (and hopefully wrong) conclusion: the 
>> getXXX calls are really costly!
>>
>> So, you are proposing using the binary protocol. I've never known the 
>> existence of that. Is this it: 
>> http://wiki.postgresql.org/wiki/JDBC-BinaryTransfer ?
>>
>> Thanks.
>>
>> [] Rui Pedro Leal
>> [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>
>> On 2013/11/05, at 18:14, Kevin Wooten wrote:
>>
>>> Well I can speak from experience...
>>>
>>> When I was comparing the performance of my implementation with 
>>> pgjdbc I was very disheartened at the beginning as pgjdbc was 
>>> seemingly much faster.  It was only after I switched the test to 
>>> take into account the getXXX calls that the true cost of using the 
>>> driver was discovered.  If you just execute a query and ignore the 
>>> results pgjdbc is essentially just receiving a stream of text.  It 
>>> ignores all of the work of turning that text into Java 
>>> objects/primitives.
>>>
>>> My implementation uses binary protocol for this reason and that 
>>> removes a lot of the overhead that pgjdbc has.  If pgAdmin is using 
>>> the binary protocol as well this could very well be the difference 
>>> that is being seen, because it is a significant amount of time.
>>>
>>> On Nov 5, 2013, at 11:03 AM, Dave Cramer <pg(at)fastcrypt(dot)com 
>>> <mailto:pg(at)fastcrypt(dot)com>> wrote:
>>>
>>>> I do not know, the best way to figure this out is to use explain 
>>>> analyze in pg_admin.
>>>>
>>>> What I am debating is the java code is using a cursor. pgAdmin is not.
>>>>
>>>> Try it without setFetchSize()
>>>>
>>>> Dave Cramer
>>>>
>>>> dave.cramer(at)credativ(dot)ca
>>>> http://www.credativ.ca <http://www.credativ.ca/>
>>>>
>>>>
>>>> On Tue, Nov 5, 2013 at 1:00 PM, Rui Pedro Leal 
>>>> <rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>>>>
>>>>     Dave,
>>>>
>>>>     I understand what you are saying, but i must get the results,
>>>>     right? :)
>>>>
>>>>     I think in pgAdmin the total query time, presented by the query
>>>>     execution windows, is also considering the rendering /
>>>>     presenting of the query results, so this is comparable ... right?
>>>>
>>>>     [] Rui Pedro Leal
>>>>     [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>>>
>>>>     On 2013/11/05, at 17:34, Dave Cramer wrote:
>>>>
>>>>>     Rui Leal,
>>>>>
>>>>>     You shouldn't see that much of a penatly, but I question how
>>>>>     you are measuring it. Since you are timing the loop which is
>>>>>     using cursors.
>>>>>
>>>>>     Just run a simple query and time it without iterating through
>>>>>     the result set. That would be equivalent
>>>>>
>>>>>     Dave Cramer
>>>>>
>>>>>     dave.cramer(at)credativ(dot)ca
>>>>>     http://www.credativ.ca <http://www.credativ.ca/>
>>>>>
>>>>>
>>>>>     On Tue, Nov 5, 2013 at 12:16 PM, Rui Pedro Leal
>>>>>     <rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>>
>>>>>     wrote:
>>>>>
>>>>>         Hello Dave, thanks for the quick reply.
>>>>>
>>>>>         I'm just executing the same query on pgAdmin query
>>>>>         interface. Just a way to measure performance of the query
>>>>>         + getting results vs the JDBC way.
>>>>>
>>>>>         I'm using fetchsize in order to get some faster results
>>>>>         when iterating the resultSet. I've tried not to use
>>>>>         fetchSize/cursor but the performance is naturally slower
>>>>>         (and worser than the simple pgAdmin query).
>>>>>
>>>>>         What i'm asking is .. is this performance penalty i'm
>>>>>         getting using postgres JDBC normal and within what's
>>>>>         expected? I'm i doing something wrong? Can i do it in any
>>>>>         other way?
>>>>>
>>>>>         Regards,
>>>>>
>>>>>         Rui Leal
>>>>>
>>>>>         [] Rui Pedro Leal
>>>>>         [] rui(dot)pedro(dot)leal(at)gmail(dot)com <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>
>>>>>
>>>>>
>>>>>         On 2013/11/05, at 16:43, Dave Cramer wrote:
>>>>>
>>>>>>         How can you measure the same query in pgAdmin ? JDBC will
>>>>>>         use a cursor when you set the fetchsize.
>>>>>>
>>>>>>         Dave Cramer
>>>>>>
>>>>>>         dave.cramer(at)credativ(dot)ca
>>>>>>         http://www.credativ.ca <http://www.credativ.ca/>
>>>>>>
>>>>>>
>>>>>>         On Tue, Nov 5, 2013 at 11:12 AM, Rui Pedro Leal
>>>>>>         <rui(dot)pedro(dot)leal(at)gmail(dot)com
>>>>>>         <mailto:rui(dot)pedro(dot)leal(at)gmail(dot)com>> wrote:
>>>>>>
>>>>>>             Just an quick info update: i'm using the
>>>>>>             postgresql-9.2-1003-jdbc4 driver
>>>>>>
>>>>>>             On 2013/11/05, at 16:09, Rui Pedro Leal wrote:
>>>>>>
>>>>>>             > Hello everyone,
>>>>>>             >
>>>>>>             > I'm having slow JDBC performance on iterating a
>>>>>>             resultSet obtained from a somewhat simple query.
>>>>>>             >
>>>>>>             > The query, although using PostGIS functions, is
>>>>>>             pretty straight forward:
>>>>>>             >
>>>>>>             > SELECT _id, _spatial_type, ST_AsBinary(_geometry),
>>>>>>             _attribute, _count, _references, countpersons,
>>>>>>             countfatals, countdrunks, density
>>>>>>             > FROM accidents_5
>>>>>>             > WHERE (_geometry &&
>>>>>>             ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603,
>>>>>>             4326) )
>>>>>>             > ORDER BY _pk_id
>>>>>>             >
>>>>>>             > The results are the following:
>>>>>>             > - Executing the query directly from pgAdmin: ~2807ms
>>>>>>             > - Executing from JVM + JDBC: 4184ms
>>>>>>             >
>>>>>>             > The code i'm executing is pretty much standard:
>>>>>>             >
>>>>>>             > -------------------- code --------------------
>>>>>>             >
>>>>>>             > public static void main(final String[] args) throws
>>>>>>             Exception {
>>>>>>             >
>>>>>>             >     String sql = "SELECT _id, _spatial_type,
>>>>>>             ST_AsBinary(_geometry), _attribute, _count,
>>>>>>             _references, countpersons, countfatals, countdrunks,
>>>>>>             density";
>>>>>>             >     sql += " FROM accidents_5";
>>>>>>             >     sql += " WHERE (_geometry &&
>>>>>>             ST_MakeEnvelope(-126.60644531250001,26.43122806450644,-63.369140625,52.96187505907603,
>>>>>>             4326) )";
>>>>>>             >     sql += " ORDER BY _pk_id";
>>>>>>             >
>>>>>>             > System.out.println(sql);
>>>>>>             >
>>>>>>             >     Long time = System.currentTimeMillis();
>>>>>>             >     try {
>>>>>>             > Connection connection =
>>>>>>             DataStoreInfo.getDataStores().get(0);
>>>>>>             > connection.setAutoCommit(false);
>>>>>>             >
>>>>>>             > System.out.println("[QUERY " +
>>>>>>             Thread.currentThread().getId()
>>>>>>             > + "] - connection in "
>>>>>>             > + (System.currentTimeMillis() - time) + "ms.");
>>>>>>             >
>>>>>>             > Statement st = connection.createStatement(
>>>>>>             > ResultSet.TYPE_FORWARD_ONLY,
>>>>>>             ResultSet.CONCUR_READ_ONLY,
>>>>>>             > ResultSet.CLOSE_CURSORS_AT_COMMIT);
>>>>>>             > st.setFetchSize(250);
>>>>>>             >
>>>>>>             > System.out.println("[QUERY " +
>>>>>>             Thread.currentThread().getId()
>>>>>>             > + "] - statement in " + (System.currentTimeMillis()
>>>>>>             - time)
>>>>>>             > + "ms.");
>>>>>>             >
>>>>>>             > ResultSet resultSet = st.executeQuery(sql);
>>>>>>             >
>>>>>>             > System.out.println("[QUERY " +
>>>>>>             Thread.currentThread().getId()
>>>>>>             > + "] - done in " + (System.currentTimeMillis() - time)
>>>>>>             > + "ms.");
>>>>>>             >
>>>>>>             > GeometryFactory geofact = new GeometryFactory(new
>>>>>>             PrecisionModel(),4326);
>>>>>>             > WKBReader wkbReader = new WKBReader(geofact);
>>>>>>             >
>>>>>>             > String id;
>>>>>>             > String spatialType;
>>>>>>             > Geometry geometry;
>>>>>>             > String attribute;
>>>>>>             > Long count;
>>>>>>             > String reference;
>>>>>>             > Map<String, Double> properties;
>>>>>>             > String granularSynthString = "GranularSynthesis";
>>>>>>             >
>>>>>>             > while (resultSet.next()) {
>>>>>>             >       id = resultSet.getString(1);
>>>>>>             > spatialType = resultSet.getString(2);
>>>>>>             >
>>>>>>             >       // geometry =
>>>>>>             wkbReader.read(resultSet.getBytes(3)); // ignored the
>>>>>>             WKBReader and the results are about the same.
>>>>>>             > attribute = resultSet.getString(4);
>>>>>>             >       count = resultSet.getLong(5);
>>>>>>             > reference = resultSet.getString(6);
>>>>>>             >
>>>>>>             > properties = new HashMap<String, Double>();
>>>>>>             >       Double aux = resultSet.getDouble(7);
>>>>>>             > properties.put("countpersons", aux);
>>>>>>             >       aux = resultSet.getDouble(8);
>>>>>>             > properties.put("countfatals", aux);
>>>>>>             >       aux = resultSet.getDouble(9);
>>>>>>             > properties.put("countdrunks", aux);
>>>>>>             >       aux = resultSet.getDouble(10);
>>>>>>             > properties.put("density", aux);
>>>>>>             >             }
>>>>>>             >
>>>>>>             > System.out.println("[QUERY " +
>>>>>>             Thread.currentThread().getId()
>>>>>>             > + "] - done & iterated in "
>>>>>>             > + (System.currentTimeMillis() - time) + "ms.");
>>>>>>             >
>>>>>>             > resultSet.close();
>>>>>>             > st.close();
>>>>>>             > connection.commit();
>>>>>>             > connection.close();
>>>>>>             >     } catch (SQLException exception) {
>>>>>>             > exception.printStackTrace();
>>>>>>             >     // } catch (ParseException exception) {  //
>>>>>>             ignored from WKBreader
>>>>>>             >     // exception.printStackTrace();
>>>>>>             >     }
>>>>>>             >
>>>>>>             > System.out.println("[End " +
>>>>>>             Thread.currentThread().getId()
>>>>>>             >       + "] - done in " +
>>>>>>             (System.currentTimeMillis() - time) + "ms.");
>>>>>>             >       }
>>>>>>             >
>>>>>>             > -------------------- end code --------------------
>>>>>>             >
>>>>>>             > Although i'm executing this on slow MacbookPro (2.6
>>>>>>             core duo, 2Gbs RAM but SSD) and have a 9.1.2
>>>>>>             postgres, i've also tested this on a recent retina
>>>>>>             MBP and the ratio between pgAdmin and JDBC execution
>>>>>>             is similar.
>>>>>>             >
>>>>>>             > Is this expected? Can someone point if i'm doing
>>>>>>             something terrible wrong?
>>>>>>             >
>>>>>>             > I'm not concerned about the query performance
>>>>>>             per-se (i know it CAN be optimized), but the
>>>>>>             differences just using JDBC and iterating the
>>>>>>             resultSet are really annoying.
>>>>>>             >
>>>>>>             > Thanks in advance for any help.
>>>>>>             >
>>>>>>             > Kind regards,
>>>>>>             >
>>>>>>             > Rui Leal
>>>>>>             >
>>>>>>             >
>>>>>>
>>>>>>
>>>>>>
>>>>>>             --
>>>>>>             Sent via pgsql-jdbc mailing list
>>>>>>             (pgsql-jdbc(at)postgresql(dot)org
>>>>>>             <mailto:pgsql-jdbc(at)postgresql(dot)org>)
>>>>>>             To make changes to your subscription:
>>>>>>             http://www.postgresql.org/mailpref/pgsql-jdbc
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Does this patches already integrated into main stream driver source code?
Edson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kevin Wooten | 2013-11-05 18:55:27 | Re: JDBC slow performance on resultSet iteration? | 
| Previous Message | Kevin Wooten | 2013-11-05 18:45:55 | Re: JDBC slow performance on resultSet iteration? |