Re: Wanted: jsonb on-disk representation documentation

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wanted: jsonb on-disk representation documentation
Date: 2014-05-07 19:50:24
Message-ID: CAM3SWZTkeN1KXD+63uoN7J0do-WNDOU0grEWkPqhUUL=AToi0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 7, 2014 at 12:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The jsonb_ops storage format for values is inherently lossy, because it
> cannot distinguish the string values "n", "t", "f" from JSON null or
> boolean true, false respectively; nor does it know the difference between
> a number and a string containing digits. This appears to not quite be a
> bug because the consistent functions force recheck for all queries that
> care about values (as opposed to keys). But if it's documented anywhere
> I don't see where.

The fact that we *don't* set the reset flag for
JsonbExistsStrategyNumber, and why that's okay is prominently
documented. So I'd say that it is.

> And in any case, is it a good idea? We could fairly
> easily change things so that these cases are guaranteed distinguishable.
> We're using an entire byte to convey one bit of information (key or
> value); I'm inclined to redefine the flag byte so that it tells not just
> that but which JSON datatype is involved.

It seemed simpler to do it that way. As I've said before, jsonb_ops is
mostly concerned with hstore-style indexing. It could also be
particularly useful for expressional indexes on "tags" arrays of
strings, which is a common use-case.

jsonb_hash_ops on the other hand is for testing containment, which is
useful for querying heavily nested documents, where typically there is
a very low selectivity for keys. It's not the default largely because
I was concerned about not supporting all indexable operators by
default, and because I suspected that it would be preferred to have a
default closer to hstore.

Anyway, doing things that way for values won't obviate the need to set
the reset flag, unless you come up with a much more sophisticated
scheme. Existence (of keys) is only tested in respect of the
top-level. Containment (where values are tested) is more complicated.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-07 19:52:50 Re: Wanted: jsonb on-disk representation documentation
Previous Message Robert Haas 2014-05-07 19:35:56 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers