| From: | Josh Berkus <josh(at)agliodbs(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info> | 
| Subject: | Re: jsonb format is pessimal for toast compression | 
| Date: | 2014-09-27 01:20:14 | 
| Message-ID: | 542610CE.2010803@agliodbs.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
All,
So these results have become a bit complex. So spreadsheet time.
https://docs.google.com/spreadsheets/d/1Mokpx3EqlbWlFDIkF9qzpM7NneN9z-QOXWSzws3E-R4
Some details:
The Length-and-Offset test was performed using a more recent 9.4
checkout than the other two tests.  This was regrettable, and due to a
mistake with git, since the results tell me that there have been some
other changes.
I added two new datasets:
errlog2 is a simple, 4-column error log in JSON format, with 2 small
values and 2 large values in each datum.  It was there to check if any
of our changes affected the performance or size of such simple
structures (answer: no).
processed_b is a synthetic version of Mozilla Socorro's crash dumps,
about 900,000 of them, with nearly identical JSON on each row. These are
large json values (around 4KB each) with a broad mix of values and 5
levels of nesting.  However, none of the levels have very many keys per
level; the max is that the top level has up to 40 keys.  Unlike the
other data sets, I can provide a copy of processed_b for asking.
So, some observations:
* Data sizes with lengths-and-offets are slightly (3%) larger than
all-lengths for the pathological case (jsonbish) and unaffected for
other cases.
* Even large, complex JSON (processed_b) gets better compression with
the two patches than with head, although only slightly better (16%)
* This better compression for processed_b leads to slightly slower
extraction (6-7%), and surprisingly slower extraction for
length-and-offset than for all-lengths (about 2%).
* in the patholgical case, length-and-offset was notably faster on Q1
than all-lengths (24%), and somewhat slower on Q2 (8%).  I think this
shows me that I don't understand what JSON keys are "at the end".
* notably, length-and-offset when uncompressed (EXTERNAL) was faster on
Q1 than head!  This was surprising enough that I retested it.
Overall, I'm satisfied with the performance of the length-and-offset
patch.
-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Janes | 2014-09-27 02:55:33 | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] | 
| Previous Message | Gregory Smith | 2014-09-26 23:36:53 | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |