Re: Wanted: jsonb on-disk representation documentation

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wanted: jsonb on-disk representation documentation
Date: 2014-05-07 08:36:40
Message-ID: 5369F098.3010101@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/06/2014 11:30 PM, Peter Geoghegan wrote:
> On Tue, May 6, 2014 at 12:48 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.
>
> I'm almost certain that the only feedback of yours that I didn't
> incorporate was that I didn't change the name of JsonbValue, a
> decision I stand by, and also that I didn't add ascii art to
> illustrate the on-disk format. I can write a patch that adds the
> latter soon.

That would be great.

I found the serialization routine, convertJsonb() to be a bit of a mess.
It's maintaining a custom stack of levels, which can be handy if you
need to avoid recursion, but it's also relying on the native stack. And
I didn't understand the point of splitting it into the "walk" and "put"
functions; the division of labor between the two was far from clear
IMHO. I started refactoring that, and ended up with the attached.

One detail that I found scary is that the estSize field in JsonbValue is
not just any rough estimate. It's used ín the allocation of the output
buffer for convertJsonb(), so it has to be large enough or you hit an
assertion or buffer overflow. I believe it was correct as it was, but
that kind of programming is always scary. I refactored the
convertJsonb() function to use a StringInfo buffer instead, and removed
estSize altogether.

This is still work-in-progress, but I thought I'd post this now to let
people know I'm working on it. For example, the StringInfo isn't
actually very well suited for this purpose, it might be better to have a
custom buffer that's enlarged when needed.

For my own sanity, I started writing some docs on the on-disk format.
See the comments in jsonb.h for my understanding of it. I moved around
the structs a bit in jsonb.h, to make the format clearer, but the actual
on-disk format is unchanged.

- Heikki

Attachment Content-Type Size
jsonb-cleanup-1.patch text/x-diff 32.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2014-05-07 08:44:05 PGDLLEXPORTing all GUCs?
Previous Message Simon Riggs 2014-05-07 08:01:37 Re: [v9.5] Custom Plan API