From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON |
Date: | 2011-07-18 23:36:31 |
Message-ID: | 6D8C16B0-E98C-48B5-899A-6566C5E9A0AD@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jul19, 2011, at 00:17 , Joey Adams wrote:
> I suppose a simple solution would be to convert all escapes and
> outright ban escapes of characters not in the database encoding.
+1. Making JSON work like TEXT when it comes to encoding issues
makes this all much simpler conceptually. It also avoids all kinds
of weird issues if you extract textual values from a JSON document
server-side.
If we really need more flexibility than that, we should look at
ways to allow different columns to have different encodings. Doing
that just for JSON seems wrongs - especially because doesn't really
reduce the complexity of the problem, as your examples shows. The
essential problem here is, AFAICS, that there's really no sane way to
compare strings in two different encodings, unless both encode a
subset of unicode only.
> This would have the nice property that all strings can be unescaped
> server-side. Problem is, what if a browser or other program produces,
> say, \u00A0 (NO-BREAK SPACE), and tries to insert it into a database
> where the encoding lacks this character?
They'll get an error - just as if they had tried to store that same
character in a TEXT column.
> On the other hand, converting all JSON to UTF-8 would be simpler to
> implement. It would probably be more intuitive, too, given that the
> JSON RFC says, "JSON text SHALL be encoded in Unicode."
Yet, they only I reason I'm aware of for some people to not use UTF-8
as the server encoding is that it's pretty inefficient storage-wise for
some scripts (AFAIR some japanese scripts are an example, but I don't
remember the details). By making JSON store UTF-8 on-disk always, the
JSON type gets less appealing to those people.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2011-07-19 02:22:30 | Re: storing TZ along timestamps |
Previous Message | Tom Lane | 2011-07-18 23:07:57 | Re: patch for 9.2: enhanced errors |