From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info> |
Subject: | Re: jsonb format is pessimal for toast compression |
Date: | 2014-09-26 03:10:09 |
Message-ID: | 16663.1411701009@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> The "offsets-and-lengths" patch seems like the approach we ought to
> compare to my patch, but it looks pretty unfinished to me: AFAICS it
> includes logic to understand offsets sprinkled into a mostly-lengths
> array, but no logic that would actually *store* any such offsets,
> which means it's going to act just like my patch for performance
> purposes.
> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.
I've now done that, and attached is what I think would be a committable
version. Having done this work, I no longer think that this approach
is significantly messier code-wise than the all-lengths version, and
it does have the merit of not degrading on very large objects/arrays.
So at the moment I'm leaning to this solution not the all-lengths one.
To get a sense of the compression effects of varying the stride distance,
I repeated the compression measurements I'd done on 14 August with Pavel's
geometry data (<24077(dot)1408052877(at)sss(dot)pgh(dot)pa(dot)us>). The upshot of that was
min max avg
external text representation 220 172685 880.3
JSON representation (compressed text) 224 78565 541.3
pg_column_size, JSONB HEAD repr. 225 82540 639.0
pg_column_size, all-lengths repr. 225 66794 531.1
Here's what I get with this patch and different stride distances:
JB_OFFSET_STRIDE = 8 225 68551 559.7
JB_OFFSET_STRIDE = 16 225 67601 552.3
JB_OFFSET_STRIDE = 32 225 67120 547.4
JB_OFFSET_STRIDE = 64 225 66886 546.9
JB_OFFSET_STRIDE = 128 225 66879 546.9
JB_OFFSET_STRIDE = 256 225 66846 546.8
So at least for that test data, 32 seems like the sweet spot.
We are giving up a couple percent of space in comparison to the
all-lengths version, but this is probably an acceptable tradeoff
for not degrading on very large arrays.
I've not done any speed testing.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
jsonb-with-offsets-and-lengths-4.patch | text/x-diff | 43.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2014-09-26 04:40:19 | Re: jsonb format is pessimal for toast compression |
Previous Message | Robert Haas | 2014-09-26 02:44:49 | Re: Replication identifiers, take 3 |