From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com> |
Subject: | Re: jsonb format is pessimal for toast compression |
Date: | 2014-08-09 19:51:02 |
Message-ID: | 18816.1407613862@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>>> Trying to move the header to the end just for the sake of this
>>> doesn't strike me as a good solution as it'll make things quite
>>> a bit more complicated.
> Why is that? How much harder would it be to add a single offset
> field to the front to point to the part we're shifting to the end?
> It is not all that unusual to put a directory at the end, like in
> the .zip file format.
Yeah, I was wondering that too. Arguably, directory-at-the-end would
be easier to work with for on-the-fly creation, not that we do any
such thing at the moment. I think the main thing that's bugging Stephen
is that doing that just to make pglz_compress happy seems like a kluge
(and I have to agree).
Here's a possibly more concrete thing to think about: we may very well
someday want to support JSONB object field or array element extraction
without reading all blocks of a large toasted JSONB value, if the value is
stored external without compression. We already went to the trouble of
creating analogous logic for substring extraction from a long uncompressed
text or bytea value, so I think this is a plausible future desire. With
the current format you could imagine grabbing the first TOAST chunk, and
then if you see the header is longer than that you can grab the remainder
of the header without any wasted I/O, and for the array-subscripting case
you'd now have enough info to fetch the element value from the body of
the JSONB without any wasted I/O. With directory-at-the-end you'd
have to read the first chunk just to get the directory pointer, and this
would most likely not give you any of the directory proper; but at least
you'd know exactly how big the directory is before you go to read it in.
The former case is probably slightly better. However, if you're doing an
object key lookup not an array element fetch, neither of these formats are
really friendly at all, because each binary-search probe probably requires
bringing in one or two toast chunks from the body of the JSONB value so
you can look at the key text. I'm not sure if there's a way to redesign
the format to make that less painful/expensive --- but certainly, having
the key texts scattered through the JSONB value doesn't seem like a great
thing from this standpoint.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-08-09 21:40:49 | Re: B-Tree support function number 3 (strxfrm() optimization) |
Previous Message | Kevin Grittner | 2014-08-09 19:10:36 | Re: jsonb format is pessimal for toast compression |