[Pljava-dev] PL/java kills unicode chars?

From: chap at anastigmatix(dot)net (Chapman Flack)
To:
Subject: [Pljava-dev] PL/java kills unicode chars?
Date: 2015-09-20 03:41:04
Message-ID: 55FE2AD0.1040305@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pljava-dev

Srivatsan Ramanujam wrote:
> I believe PL/java is killing unicode characters (it is probably converting
> text to a byte stream and reading them as single byte characters - perhaps
> Latin-1 and not as UTF-8). ...

Well, this one has been open for a while. It's also in the github tracker,
https://github.com/tada/pljava/issues/21 which I've just updated with
confirming test code.

I think vatsan's comment (on the github issue) about
http://bugs.sun.com/view_bug.do?bug_id=5030776 is probably spot on.
We seem to handle the whole basic multilingual plane correctly
(nearly 64k codepoints), it's just all the planes above that getting
messed up.

Should be a straightforward fix, once I've found how many places those
not-really-UTF JNI functions are really used in the code.

-Chap

In response to

Responses

Browse pljava-dev by date

  From Date Subject
Next Message Chapman Flack 2015-09-20 13:33:26 [Pljava-dev] I remembered why we might want bytecode scalar types
Previous Message Chapman Flack 2015-09-12 21:35:06 [Pljava-dev] Pl/Java package in Ubuntu