Re: jsonb and nested hstore

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Peter Geoghegan" <pg(at)heroku(dot)com>
Cc: "Greg Stark" <stark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Oleg Bartunov" <obartunov(at)gmail(dot)com>, "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb and nested hstore
Date: 2014-03-14 11:07:17
Message-ID: 44c19860b609a519a65d632ad7b91baf.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 Březen 2014, 23:39, Peter Geoghegan wrote:
> On Thu, Mar 13, 2014 at 2:21 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> It does sound like the main question here is which opclass should be
>> the default. From the discussion there's a jsonb_hash_ops which works
>> on all input values but supports fewer operators and a jsonb_ops which
>> supports more operators but can't handle json with larger individual
>> elements. Perhaps it's better to make jsonb_hash_ops the default so at
>> least it's always safe to create a default gin index?
>
> Personally, I don't think it's a good idea to change the default. I
> have yet to be convinced that if you hit the GIN limitation it's an
> indication of anything other than that you need to reconsider your
> indexing choices (how often have we heard that complaint of GIN before
> in practice?). Even if you don't hit the limitation directly, with

I've never used GIN with anything else than values that built-in full-text
(tsvector), pg_trgm or points, and I suspect that's the case with most
other users. All those types have "naturally limited" size (e.g. words
tend to have very limited length, unless you're Maori, but even there the
longest name is just 85 characters [1]).

The only place in (core|contrib) where I'd expect this kind of issues is
probably intarray, but it's arguably less frequently used than
tsvector/pg_trgm for example.

So ISTM this is the main reason why we don't see more complaints about the
GIN size limit. I expect that to change with json + "index all" approach.

> something like jsonb_hash_ops you're still hashing a large nested
> structure, very probably uselessly. Are you really going to look for
> an exact match to an elaborate nested structure? I would think,
> probably not.

What I find (very) useful is queries that look like this:

SELECT if FROM json_table WHERE json_value @> '{"a" : {"b" : {"c" : 3}}}';

or (without the @> operator) like this:

SELECT if FROM json_table WHERE json_value #>> ARRAY['a', 'b', 'c'] = '3';

or something like that ...

> Now, as Alexander says, there might be a role for another
> (jsonb_hash_ops) opclass that separately indexes values only. I still
> think that by far the simplest solution is to use expressional
> indexes, because we index key values and array element values
> indifferently. Of course, nothing we have here precludes the
> development of such an opclass.

Maybe. I don't have much insight into ho GIN works / what is possible. But
I think we should avoid having large number of opclasses, each supporting
a small fraction of use cases. If we could keep the two we have right now,
that'd be nice.

regards
Tomas

[1] http://en.wikipedia.org/wiki/List_of_long_place_names

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-03-14 11:10:48 Re: plpgsql.warn_shadow
Previous Message Peter Geoghegan 2014-03-14 11:03:56 Re: Failure while inserting parent tuple to B-tree is not fun