From: | Mike Rylander <mrylander(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal: Add JSON support |
Date: | 2010-03-29 00:23:06 |
Message-ID: | b918cf3d1003281723q55a028fak545c71d459a25ef4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Here's another thought. Given that JSON is actually specified to consist
>> of a string of Unicode characters, what will we deliver to the client
>> where the client encoding is, say Latin1? Will it actually be a legal
>> JSON byte stream?
>
> No, it won't. We will *not* be sending anything but latin1 in such a
> situation, and I really couldn't care less what the JSON spec says about
> it. Delivering wrongly-encoded data to a client is a good recipe for
> all sorts of problems, since the client-side code is very unlikely to be
> expecting that. A datatype doesn't get to make up its own mind whether
> to obey those rules. Likewise, data on input had better match
> client_encoding, because it's otherwise going to fail the encoding
> checks long before a json datatype could have any say in the matter.
>
> While I've not read the spec, I wonder exactly what "consist of a string
> of Unicode characters" should actually be taken to mean. Perhaps it
> only means that all the characters must be members of the Unicode set,
> not that the string can never be represented in any other encoding.
> There's more than one Unicode encoding anyway...
In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.
Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely. If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \uXXXX equivalents. I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)
UPDATE: Thanks, Robert, for pointing to the RFC.
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-03-29 00:33:51 | Re: Proposal: Add JSON support |
Previous Message | Robert Haas | 2010-03-29 00:16:40 | Re: Proposal: Add JSON support |