From: | Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> |
---|---|
To: | Abhijit Menon-Sen <ams(at)toroid(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com> |
Subject: | Re: JSON for PG 9.2 |
Date: | 2012-01-31 20:47:05 |
Message-ID: | CAARyMpAW9N+6_kcob-R=DMP3HyvXXduA3kAecsocLPNZq04CyQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 31, 2012 at 1:29 PM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> At 2012-01-31 12:04:31 -0500, robertmhaas(at)gmail(dot)com wrote:
>>
>> That fails to answer the question of what we ought to do if we get an
>> invalid sequence there.
>
> I think it's best to categorically reject invalid surrogates as early as
> possible, considering the number of bugs that are related to them (not
> in Postgres, just in general). I can't see anything good coming from
> letting them in and leaving them to surprise someone in future.
>
> -- ams
+1
Another sequence to beware of is \u0000. While escaped NUL characters
are perfectly valid in JSON, NUL characters aren't allowed in TEXT
values. This means not all JSON strings can be converted to TEXT,
even in UTF-8. This may also complicate collation, if comparison
functions demand null-terminated strings.
I'm mostly in favor of allowing \u0000. Banning \u0000 means users
can't use JSON strings to marshal binary blobs, e.g. by escaping
non-printable characters and only using U+0000..U+00FF. Instead, they
have to use base64 or similar.
Banning \u0000 doesn't quite violate the RFC:
An implementation may set limits on the length and character
contents of strings.
-Joey
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-01-31 20:55:16 | Re: [v9.2] Add GUC sepgsql.client_label |
Previous Message | Andrew Dunstan | 2012-01-31 20:23:02 | Re: [GENERAL] pg_dump -s dumps data?! |