Re: additional json functionality

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: additional json functionality
Date: 2013-11-15 20:25:24
Message-ID: CAHyXU0xa=1jJ3+4LJBRpyoFfUsz=7mQxQg8mdteAacbnNxJahw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> Here are the options on the table:
>> 1) convert existing json type to binary flavor (notwithstanding objections)
>> 2) maintain side by side types, one representing binary, one text.
>> unfortunately, i think the text one must get the name 'json' due to
>> unfortunate previous decision.
>> 3) merge the behaviors into a single type and get the best of both
>> worlds (as suggested upthread).
>>
>> I think we need to take a *very* hard look at #3 before exploring #1
>> or #2: Haven't through it through yet but it may be possible to handle
>> this in such a way that will be mostly transparent to the end user and
>> may have other benefits such as a faster path for serialization.
>
> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (see http://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of examples) --- without undue maintenance overhead, then great.
>
> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at $work.

Kinda yes, kinda no. Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered). The binary representation is the hstore-ish
variant. The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there. otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode. So, if you went: 'select '{"a":
1, "a": 2}'::json', you'd get the same thing back. (This is how
it works now.). but, if you went: 'insert into foo select '{"a": 1,
"a": 2}'::json returning *', you'd get {"a": 2} back essentially
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such. It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data. They create *only* the
text mode. However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant. This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise. so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-11-15 20:37:25 Re: additional json functionality
Previous Message Robert Haas 2013-11-15 20:21:57 Re: Errors on missing pg_subtrans/ files with 9.3