From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | pgsql-committers(at)postgresql(dot)org |
Subject: | Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON. |
Date: | 2013-06-24 17:58:12 |
Message-ID: | 51C888B4.3070806@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
On 06/24/2013 11:50 AM, Bruce Momjian wrote:
> On Sat, Jun 8, 2013 at 01:21:20PM +0000, Andrew Dunstan wrote:
>> Handle Unicode surrogate pairs correctly when processing JSON.
>>
>> In 9.2, Unicode escape sequences are not analysed at all other than
>> to make sure that they are in the form \uXXXX. But in 9.3 many of the
>> new operators and functions try to turn JSON text values into text in
>> the server encoding, and this includes de-escaping Unicode escape
>> sequences. This processing had not taken into account the possibility
>> that this might contain a surrogate pair to designate a character
>> outside the BMP. That is now handled correctly.
>>
>> This also enforces correct use of surrogate pairs, something that is not
>> done by the type's input routines. This fact is noted in the docs.
>>
>> Branch
>> ------
>> master
>>
>> Details
>> -------
>> http://git.postgresql.org/pg/commitdiff/94e3311b97448324d67ba9a527854271373329d9
>>
>> Modified Files
>> --------------
>> doc/src/sgml/func.sgml | 9 +++++++
>> src/backend/utils/adt/json.c | 52 ++++++++++++++++++++++++++++++++++++
>> src/test/regress/expected/json.out | 23 ++++++++++++++++
>> src/test/regress/sql/json.sql | 8 ++++++
>> 4 files changed, 92 insertions(+)
> Does this affect any data already stored in PG 9.3 beta? Is it
> something that should require a catalog bump?
>
No and no. All it means is that where we previously extracted data
encoded with surrogate pairs incorrectly, now we do it correctly. Only
the processing functions enforce this - for legacy reasons the input
routines don't enforce correct use of surrogate pairs - or indeed any
unicode escapes, as long as they are in the form \uxxxx
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2013-06-24 18:23:08 | pgsql: Translation updates |
Previous Message | Bruce Momjian | 2013-06-24 15:50:38 | Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON. |