Re: Very Limited Toast Compression on JSONB (9.4 beta 2)

From: Larry White <ljw1001(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Very Limited Toast Compression on JSONB (9.4 beta 2)
Date: 2014-08-01 18:34:31
Message-ID: CAMdbzVgfT5dpoh_E00XR785yBs94Ed_wqkDUUNRh4jvTiqUrbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Reported as bug #11109.

On Fri, Aug 1, 2014 at 1:46 PM, Larry White <ljw1001(at)gmail(dot)com> wrote:

> Jeff,
>
> Thank you for your help. This is a Postgres bug, but I don't think I'd
> have figured it out without your help.
>
> What is happening is that if PG can, after compression, put the entire
> 'document' into one row/page in the toast table it does. However, if the
> document is too big to fit in one row after compression, it does no
> compression at all. This is why it worked for you, but not for me.
>
> I create my test file (in part) with this loop:
>
> for (int j = 0; j < 110; j++) {
> mediumPayload.getJunk().add("124245etweetwet345gwtretwt43
> qwrqwq qwre qw rsdflkas");
> mediumPayload.getJunk().add("q4535 wqrqwrqwrqw2
> wrqwrqwrq32232w kswe sfasrs sdfsd");
> }
>
> if the loop runs 110 times as shown, it compresses.
> if the loop runs 111 times, it does not:
>
> With 110 iterations:
> Extended 8192 bytes (one page)
> External 66 MB
>
> With 111 iterations:
> Extended 69 MB
> External 69 MB
>
> Hopefully they can fix this before the GA release.
>
>
> On Fri, Aug 1, 2014 at 12:38 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Thu, Jul 31, 2014 at 11:36 PM, Larry White <ljw1001(at)gmail(dot)com> wrote:
>>
>>>
>>> On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>
>>>> On Thursday, July 31, 2014, Larry White <ljw1001(at)gmail(dot)com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm running an experiment on 9.4 beta 2.
>>>>>
>>>>> I put 275,000 identical JSON files into a table using JSONB (one per
>>>>> row). Each raw text file is 251K in size, so the total uncompressed is
>>>>> 69GB. The column storage is set to EXTENDED. There are other toastable
>>>>> columns in the table, but none have more than 36 bytes of data in them.
>>>>>
>>>>> My Toast table is 66GB. I would have expected to get that much (or
>>>>> more) compression just from JSONB being a binary format.
>>>>>
>>>>> If I compress one of these JSON files outside of Postgres, it goes
>>>>> from 251K to 1K.
>>>>>
>>>>
>>>> That is an astonishing amount of compression. Are you really
>>>> compressing one of the files in isolation, rather than co-compressing many
>>>> and then pro-rating the result?
>>>>
>>>
>>> Yes, I should have explained. These are generated JSON files for testing
>>> and there is massive repetition in them, which is why they compress so well
>>> outside of Postgres. (Basically there is a repeating array of the same
>>> string) I did compress just the one.
>>>
>>>
>>>>
>>>> Can you provide an example of the data, and the command line you used
>>>> to compress it?
>>>>
>>>
>>> Compressed on a Mac with the Compress UI option. Here's a brief sample
>>> from the file. You can see why it compresses so well:
>>> "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw
>>> rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
>>> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>>>
>>> and so on.
>>>
>>
>> If I take that example (and cap off the array and hash right after the
>> end of what you show, and remove the escapes of the double quote marks)
>> then it does not compress, but only because it is not long enough to
>> trigger the compression attempts.
>>
>> If I repeat the array portion 4 more times to make the whole thing long
>> enough for compression to be used, it compresses nicely. Not 100 fold (but
>> then again, neither does bzip2 or gzip on the data I just described), but
>> text and json compresses 10 fold and jsonb 5 fold.
>>
>> Cheers,
>>
>> Jeff
>>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G Johnston 2014-08-01 19:19:07 Re: Feature proposal and discussion: full-fledged column/function equivalence
Previous Message Larry White 2014-08-01 17:46:05 Re: Very Limited Toast Compression on JSONB (9.4 beta 2)