From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> |
Cc: | Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON |
Date: | 2011-07-20 02:01:09 |
Message-ID: | 1311126863-sup-7396@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Excerpts from Joey Adams's message of mar jul 19 21:03:15 -0400 2011:
> On Mon, Jul 18, 2011 at 7:36 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> > On Jul19, 2011, at 00:17 , Joey Adams wrote:
> >> I suppose a simple solution would be to convert all escapes and
> >> outright ban escapes of characters not in the database encoding.
> >
> > +1. Making JSON work like TEXT when it comes to encoding issues
> > makes this all much simpler conceptually. It also avoids all kinds
> > of weird issues if you extract textual values from a JSON document
> > server-side.
>
> Thanks for the input. I'm leaning in this direction too. However, it
> will be a tad tricky to implement the conversions efficiently, since
> the wchar API doesn't provide a fast path for individual codepoint
> conversion (that I'm aware of), and pg_do_encoding_conversion doesn't
> look like a good thing to call lots of times.
>
> My plan is to scan for escapes of non-ASCII characters, convert them
> to UTF-8, and put them in a comma-delimited string like this:
>
> a,b,c,d,
>
> then, convert the resulting string to the server encoding (which may
> fail, indicating that some codepoint(s) are not present in the
> database encoding). After that, read the string and plop the
> characters where they go.
Ugh.
> It's "clever", but I can't think of a better way to do it with the existing API.
Would it work to have a separate entry point into mbutils.c that lets
you cache the conversion proc caller-side? I think the main problem is
determining the byte length of each source character beforehand.
--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2011-07-20 03:17:54 | Re: range types and ip4r |
Previous Message | Alvaro Herrera | 2011-07-20 01:46:22 | Re: Another issue with invalid XML values |