From: | Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> |
---|---|
To: | Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: JSON data type status? |
Date: | 2011-01-21 08:58:53 |
Message-ID: | AANLkTi=ti88X2KR035_G28WJR+-EFTgMMQYpTd5YZw1=@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 20, 2011 at 8:13 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Fri, Jan 21, 2011 at 09:11, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> What happened to our work to add a JSON data type for PG 9.1?
>
> Nothing will happen in 9.1.
> I assume we are in "competition" status:
>
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php
>
> Also, if PGXN will work well, we might not have to include JSON
> in the core. We can download any JSON implementations from the
> site after installing the core server. Of course, if we will
> use JSON types in the core (EXPLAIN JSON output?), we have to
> include one of them.
Thanks for the update. However, I should probably take some blame as
well for spending basically zero time on the JSON datatype for lengths
at a time.
After someone emailed me asking how to install the JSON datatype, I
fixed up my module version of it:
http://git.postgresql.org/gitweb?p=json-datatype.git
I also have a version where JSON is a core datatype (the patches I
submitted to CommitFest), but it is obsolete now. The module version
has all the same features, but also fixes a Unicode bug* and adds
PostgreSQL 8.4 compatibility. For those who want the JSON datatype
right now, I recommend using the module.
I plan to maintain and improve the JSON module (repository linked
above), albeit at a snail's pace. Patches are certainly welcome. See
roadmap.markdown in the repository for some minor and not-so-minor
design decisions.
Joey Adams
* P.S.
The "Unicode bug" involved UTF-16 surrogate pair calculation. JSON
encodes Unicode characters that aren't in the Basic Multilingual Plane
the same way UTF-16 does: using two 4-digit hex codes. The correct
formula for determining the Unicode codepoint of a surrogate pair is:
unicode = 0x10000 + (((uc & 0x3FF) << 10) | (lc & 0x3FF));
where:
0xD800 <= uc <= 0xDBFF
0xDC00 <= lc <= 0xDFFF
For example, consider the JSON string "\uD835\uDD0D":
uc = 0xD835
lc = 0xDD0D
unicode = 0x1D50D
This pair of hex indices collapses into one Unicode codepoint, 0x1D50D
(or "𝔍").
I originally used this:
/* WRONG */
unicode = 0x10000 | ((uc & 0x3FF) << 10) | (lc & 0x3FF);
The problem is, ((uc & 0x3FF) << 10) | (lc & 0x3FF) has a range of 0
to 0xFFFFF, and in some of those numbers, `0x10000 |` adds zero.
I am sharing all this because I have made this mistake twice, and I
believe it is an easy mistake to make.
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2011-01-21 09:10:47 | Re: SSI and Hot Standby |
Previous Message | Fujii Masao | 2011-01-21 08:35:07 | Re: pg_basebackup for streaming base backups |