new String(byte[]) performance

From: Teofilis Martisius <teo(at)teohome(dot)lzua(dot)lt>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: new String(byte[]) performance
Date: 2002-09-11 09:57:35
Message-ID: 20020911095735.GA6185@teohome.lzua.lt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hello,

While looking through postgresql JDBC driver sources and profiling, I
noticed that the driver uses new String(byte[]) a lot while iterating a
ResultSet. And I noticed that this String constructor takes a lot of
time. I wrote a custom byte[]->String conversion method for UTF-8 that
speeds up iterating over ResultSet 2 times or even more. I have a patch
for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
sure it gets accepted. It does speed things up quite a noticable amount.

Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
should be allocated for each call. static cdata version was faster.

By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
called for a VARCHAR field? I would suggest converting byte arrays to
Strings or even to more precisely typed values (Integers, Doubles and so
on) on QueryExecutor().execute(). This should save some RAM allocation
for receiveTuple, because now memory gets allocated several times- once
for byte[], and second time for String, and third time for Integer or
other object in getObject(). Memory allocation takes a considerable
amount of time. But this stronger typing would remove some of
flexibility to any getXXX for any SQL type field. And it would probably
make the querying itself (QueryExecutor.execute() slower, i don't know
:/

Teofilis Martisius

Anyway, here is the patch to fix string decoding:

diff -r -u ./org/postgresql/core/Encoding.java /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
--- ./org/postgresql/core/Encoding.java 2001-11-20 00:33:37.000000000 +0200
+++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java 2002-09-11 15:56:10.000000000 +0200
@@ -155,6 +155,9 @@
}
else
{
+ if (encoding.equals("UTF-8")) {
+ return decodeUTF8(encodedString, offset, length);
+ }
return new String(encodedString, offset, length, encoding);
}
}
@@ -163,6 +166,43 @@
throw new PSQLException("postgresql.stream.encoding", e);
}
}
+ /**
+ * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
+ */
+ static final int pow2_6 = 64; // 2^6
+ static final int pow2_12 = 4096; // 2^12
+ static char cdata[] = new char[50];
+
+ public static final String decodeUTF8(byte data[], int offset, int length) {
+ if (cdata.length < (length-offset)) {
+ cdata = new char[length-offset];
+ }
+ int i = offset;
+ int j = 0;
+ int z, y, x, val;
+ while (i < length) {
+ z = data[i] & 0xFF;
+ if (z < 0x80) {
+ cdata[j++] = (char)data[i];
+ i++;
+ } else if (z >= 0xE0) { // length == 3
+ y = data[i+1] & 0xFF;
+ x = data[i+2] & 0xFF;
+ val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
+ cdata[j++] = (char) val;
+ i+= 3;
+ } else { // length == 2 (maybe add checking for length > 3, throw exception if it is
+ y = data[i+1] & 0xFF;
+ val = (z - 0xC0)* (pow2_6)+(y-0x80);
+ cdata[j++] = (char) val;
+ i+=2;
+ }
+ }
+
+ String s = new String(cdata, 0, j);
+ return s;
+ }
+

/*
* Decode an array of bytes into a string.

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Rod Taylor 2002-09-11 12:38:44 Re: problem with new autocommit config parameter and jdbc
Previous Message Laszlo Hornyak 2002-09-11 08:35:08 Re: little off-topic: stored procedures