Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Terry Laurenzo <tj(at)laurenzo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-10-19 21:59:55
Message-ID: AANLkTimEcNATebYGkHMEiQksfie4xhufe-XS+mK==jLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> After spending a week in the morass of this, I have to say that I am less
> certain than I was on any front regarding the text/binary distinction.  I'll
> take some time and benchmark different cases.  My hypothesis is that a well
> implemented binary structure and conversions will add minimal overhead in
> the IO + Validate case which would be the typical in/out flow.  It could be
> substantially faster for binary send/receive because the validation step
> could be eliminated/reduced.  Further storing as binary reduces the overhead
> of random access to the data by database functions.
>
> I'm envisioning staging this up as follows:
>    1. Create a "jsontext".  jsontext uses text as its internal
> representation.  in/out functions are essentially a straight copy or a copy
> + validate.
>    2. Create a "jsonbinary" type.  This uses an optimized binary format for
> internal rep and send/receive.  in/out is a parse/transcode operation to
> standard JSON text.
>    3. Internal data access functions and JSON Path require a jsonbinary.
>    4. There are implicit casts to/from jsontext and jsonbinary.

some years ago I solved similar problems with xml type. I think, so
you have to calculate with two factors:

a) all varlena types are compressed - you cannot to get some
interesting fragment or you cannot tu update some interesting
fragment, every time pg working with complete document

b) access to some fragment of JSON or XML document are not really
important, because fast access to data are solved via indexes.

c) only a few API allows binary communication between server/client.
Almost all interfases use only text based API. I see some possible
interesting direction for binary protocol when some one uses a
javascript driver, when some one use pg in some javascript server side
environment, but it isn't a often used now.

Regards

Pavel

>
> I've got a grammar in mind for the binary structure that I'll share later
> when I've got some more time.  It's inspired by $COMPETITOR's format but a
> little more sane, using type tags that implicitly define the size of the
> operands, simplifying parsing.
>
> I'll then define the various use cases and benchmark using the different
> types.  Some examples include such as IO No Validate, IO+Validate, Store and
> Index, Internal Processing, Internal Composition, etc.
>
> The answer may be to have both a jsontext and jsonbinary type as each will
> be optimized for a different case.
>
> Make sense?  It may be a week before I get through this.
> Terry
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2010-10-19 22:03:35 Re: pg_rawdump
Previous Message Greg Stark 2010-10-19 21:57:41 Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)