Re: JSONB spaces in text presentation

From: Seref Arikan <serefarikan(at)gmail(dot)com>
To: obartunov(at)gmail(dot)com
Cc: Seref Arikan <serefarikan(at)gmail(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSONB spaces in text presentation
Date: 2014-09-24 20:41:39
Message-ID: CA+4ThdoJZN7g9v6CwbNWHebvTozQRiOioEEBjSntM_rWuLQnEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Oleg, I'll check the slides.

On Wed, Sep 24, 2014 at 8:07 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

> Check slides 17-20 of
> http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to
> understand, what 'binary format' means. The slides describes binary storage
> for nested hstore, not jsonb, but you'll get the idea.
>
> On Wed, Sep 24, 2014 at 6:22 PM, Seref Arikan <serefarikan(at)gmail(dot)com>
> wrote:
>
>> This is interesting. Most binary encoding methods I use produce smaller
>> files than the text files for the same content.
>> Having read your mail, I've realized that I have no reason to accept the
>> same from the jsonb. I did a quick google search to see if it is wrong to
>> expect binary encoding to decrease size and saw that I'm not alone (which
>> still does not mean I'm being reasonable).
>> This project: http://ubjson.org/#size is one of the hits which mentions
>> some nice space gains thanks to binary encoding.
>>
>> The "much larger" part is a bit scary. Is this documented somewhere?
>>
>> Best regards
>> Seref
>>
>>
>> On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
>> wrote:
>>
>>> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep(at)gmail(dot)com>
>>> wrote:
>>> > I'm sorry about sending email several times. I haven't understand, was
>>> it
>>> > sent by gmail or not.
>>> >
>>> >
>>> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce(at)hogranch(dot)com>
>>> wrote:
>>> >>
>>> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>> >>>
>>> >>>
>>> >>> Is spaces is necessary in text presentation of JSONB?
>>> >>> In my data resulting text contains ~12% of spaces.
>>> >>
>>> >>
>>> >> can you show us an example of this?
>>> >
>>> >
>>> > One record
>>> > # select data from events.data limit 1;
>>> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
>>> 74,
>>> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
>>> "gps": 1,
>>> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
>>> "speed":
>>> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
>>> "digital":
>>> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1":
>>> 0,
>>> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
>>> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
>>> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
>>> "terminal",
>>> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>>> >
>>> > Whitespacis percents in this record:
>>> > # select array_length(regexp_split_to_array(data::text, text ' '),
>>> > 1)*100./length(data::text) from events.data limit 1;
>>> > ?column?
>>> > ---------------------
>>> > 12.3417721518987342
>>> >
>>> > Whitespace in test data
>>> > # select count(*),avg(array_length(regexp_split_to_array(data::text,
>>> text '
>>> > '), 1)*100./length(data::text)) from events.data ;
>>> > count | avg
>>> > --------+---------------------
>>> > 242222 | 12.3649234646118312
>>>
>>>
>>> For jsonb (unlike json), data is not actually stored as json but in a
>>> binary format. It will generally be much larger than the text
>>> representation in fact but in exchange for that many operations will
>>> be faster. The spaces you see are generated when the jsonb type is
>>> converted to text for output. I actually think it's pretty reasonable
>>> to want to redact all spaces from such objects in all cases where
>>> converstion to text happens (output functions, xxxto_json, etc)
>>> because ~12% savings are nothing to sneeze at when moving large
>>> documents in and out of the database.
>>>
>>> On the flip side, a more verbose prettification would be pretty nice
>>> too. I wonder if a hypothetical GUC is the best way to control this
>>> behavior...
>>>
>>> merlin
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul Ramsey 2014-09-24 21:01:40 spgist index not getting used
Previous Message Steve Dodd 2014-09-24 19:55:05 Re: Transaction completion timing