Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.
Date: 2013-06-24 17:58:12
Message-ID: 51C888B4.3070806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers


On 06/24/2013 11:50 AM, Bruce Momjian wrote:
> On Sat, Jun 8, 2013 at 01:21:20PM +0000, Andrew Dunstan wrote:
>> Handle Unicode surrogate pairs correctly when processing JSON.
>>
>> In 9.2, Unicode escape sequences are not analysed at all other than
>> to make sure that they are in the form \uXXXX. But in 9.3 many of the
>> new operators and functions try to turn JSON text values into text in
>> the server encoding, and this includes de-escaping Unicode escape
>> sequences. This processing had not taken into account the possibility
>> that this might contain a surrogate pair to designate a character
>> outside the BMP. That is now handled correctly.
>>
>> This also enforces correct use of surrogate pairs, something that is not
>> done by the type's input routines. This fact is noted in the docs.
>>
>> Branch
>> ------
>> master
>>
>> Details
>> -------
>> http://git.postgresql.org/pg/commitdiff/94e3311b97448324d67ba9a527854271373329d9
>>
>> Modified Files
>> --------------
>> doc/src/sgml/func.sgml | 9 +++++++
>> src/backend/utils/adt/json.c | 52 ++++++++++++++++++++++++++++++++++++
>> src/test/regress/expected/json.out | 23 ++++++++++++++++
>> src/test/regress/sql/json.sql | 8 ++++++
>> 4 files changed, 92 insertions(+)
> Does this affect any data already stored in PG 9.3 beta? Is it
> something that should require a catalog bump?
>

No and no. All it means is that where we previously extracted data
encoded with surrogate pairs incorrectly, now we do it correctly. Only
the processing functions enforce this - for legacy reasons the input
routines don't enforce correct use of surrogate pairs - or indeed any
unicode escapes, as long as they are in the form \uxxxx

cheers

andrew

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2013-06-24 18:23:08 pgsql: Translation updates
Previous Message Bruce Momjian 2013-06-24 15:50:38 Re: pgsql: Handle Unicode surrogate pairs correctly when processing JSON.