Re: JSONB spaces in text presentation

From: Seref Arikan <serefarikan(at)gmail(dot)com>
To: pgsql <pgsql-general(at)postgresql(dot)org>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSONB spaces in text presentation
Date: 2014-09-24 14:22:00
Message-ID: CA+4ThdrvVqxmN-sG+P_nSSFe5mHgi=zwu_-LvLgnXmMqu81p2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref

On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep(at)gmail(dot)com>
> wrote:
> > I'm sorry about sending email several times. I haven't understand, was it
> > sent by gmail or not.
> >
> >
> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce(at)hogranch(dot)com>
> wrote:
> >>
> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
> >>>
> >>>
> >>> Is spaces is necessary in text presentation of JSONB?
> >>> In my data resulting text contains ~12% of spaces.
> >>
> >>
> >> can you show us an example of this?
> >
> >
> > One record
> > # select data from events.data limit 1;
> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
> 1,
> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
> "digital":
> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
> "terminal",
> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
> >
> > Whitespacis percents in this record:
> > # select array_length(regexp_split_to_array(data::text, text ' '),
> > 1)*100./length(data::text) from events.data limit 1;
> > ?column?
> > ---------------------
> > 12.3417721518987342
> >
> > Whitespace in test data
> > # select count(*),avg(array_length(regexp_split_to_array(data::text,
> text '
> > '), 1)*100./length(data::text)) from events.data ;
> > count | avg
> > --------+---------------------
> > 242222 | 12.3649234646118312
>
>
> For jsonb (unlike json), data is not actually stored as json but in a
> binary format. It will generally be much larger than the text
> representation in fact but in exchange for that many operations will
> be faster. The spaces you see are generated when the jsonb type is
> converted to text for output. I actually think it's pretty reasonable
> to want to redact all spaces from such objects in all cases where
> converstion to text happens (output functions, xxxto_json, etc)
> because ~12% savings are nothing to sneeze at when moving large
> documents in and out of the database.
>
> On the flip side, a more verbose prettification would be pretty nice
> too. I wonder if a hypothetical GUC is the best way to control this
> behavior...
>
> merlin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-09-24 14:34:44 Re: JSONB spaces in text presentation
Previous Message Adrian Klaver 2014-09-24 13:51:03 Re: JSONB spaces in text presentation