Quick Links

Fwd: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From:	Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Fwd: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date:	2011-07-20 03:42:52
Message-ID:	CAARyMpA20PBNoS6Vv5yx8-7icfNqRNx=Rxfwj4LiYCk99ykLBg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Forwarding because the mailing list rejected the original message.

---------- Forwarded message ----------
From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Date: Tue, Jul 19, 2011 at 11:23 PM
Subject: Re: Initial Review: JSON contrib modul was: Re: [HACKERS]
Another swing at JSON
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert
Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>,
Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter
<david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers
<pgsql-hackers(at)postgresql(dot)org>

On Tue, Jul 19, 2011 at 10:01 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Would it work to have a separate entry point into mbutils.c that lets
> you cache the conversion proc caller-side?

That sounds like a really good idea. There's still the overhead of
calling the proc, but I imagine it's a lot less than looking it up.

> I think the main problem is
> determining the byte length of each source character beforehand.

I'm not sure what you mean. The idea is to convert the \uXXXX escape
to UTF-8 with unicode_to_utf8 (the length of the resulting UTF-8
sequence is easy to compute), call the conversion proc to get the
null-terminated database-encoded character, then append the result to
whatever StringInfo the string is going into.

The only question mark is how big the destination buffer will need to
be. The maximum number of bytes per char in any supported encoding is
4, but is it possible for one Unicode character to turn into multiple
"character"s in the database encoding?

While we're at it, should we provide the same capability to the SQL
parser? Namely, the ability to use \uXXXX escapes above U+007F when
the server encoding is not UTF-8?

- Joey

In response to

Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON at 2011-07-20 02:01:09 from Alvaro Herrera

Responses

Re: Fwd: Initial Review: JSON contrib modul was: Re: Another swing at JSON at 2011-07-20 03:45:22 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2011-07-20 03:45:22	Re: Fwd: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Previous Message	Jeff Davis	2011-07-20 03:17:54	Re: range types and ip4r