Re: Wanted: jsonb on-disk representation documentation

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wanted: jsonb on-disk representation documentation
Date: 2014-05-07 18:13:44
Message-ID: 20140507181344.GK13397@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 4:40 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > * The jentry representation should be changed so it's possible to get the type
> > of a entry without checking individual types. That'll make code like
> > findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> > much easier to read. Preferrably so it an have the same values (after
> > shifting/masking) ask the JBE variants. And it needs space for futher
> > types (or representations thereof).
>
> I don't know what you mean. Every place that macros like
> JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
> union fields. At best, you could have those functions check that the
> types match, and then handle each case in a switch that only looked at
> (say) the "candidate", but that doesn't really save much at all. It
> wouldn't take much to have the macros give enum constant values back
> as you suggest, though.

Compare
for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (JBE_ISNULL(*e) && key->type == jbvNull)
{
result->type = jbvNull;
result->estSize = sizeof(JEntry);
}
else if (JBE_ISSTRING(*e) && key->type == jbvString)
{
result->type = jbvString;
result->val.string.val = data + JBE_OFF(*e);
result->val.string.len = JBE_LEN(*e);
result->estSize = sizeof(JEntry) + result->val.string.len;
}
else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));

result->estSize = 2 * sizeof(JEntry) +
VARSIZE_ANY(result->val.numeric);
}
else if (JBE_ISBOOL(*e) && key->type == jbvBool)
{
result->type = jbvBool;
result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
result->estSize = sizeof(JEntry);
}
else
continue;

if (compareJsonbScalarValue(key, result) == 0)
return result;
}
with

for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (!JBE_TYPE_IS_SCALAR(*e))
continue;

if (JBE_TYPE(*e) != key->type)
continue;

result = getJsonbValue(e);

if (compareJsonbScalarValue(key, result) == 0)
return result;
}

Yes, it's not a fair comparison because I made up getJsonbValue(). But
it's *much* more readable regardless. And there's more places that could
use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW,
that's one of the requests I definitely made before.

> > * I wonder if the hash/object pair representation is wise and if it
> > shouldn't be keys combined with offsets first, and then the
> > values. That will make access much faster. And that's what jsonb
> > essentially is about.
>
> I like that the physical layout reflects the layout of the original
> JSON document.

Don't see much value in that. This is a binary representation and it'd
be bijective.

> Besides, I don't think this is obviously the case. Are
> you sure that it won't be more useful to make children as close to
> their parents as possible? Particularly given idiomatic usage.

Because - if done right - it would allow elementwise access without
scanning previous values.

> > * I think both arrays and hashes should grow individual structs. With
> > space for additional flags.

> Why?

Because a) it will make the code more readable b) it'll allow for
adding different representations of hashes/arrays.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-05-07 18:20:34 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Previous Message Peter Geoghegan 2014-05-07 18:13:14 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers