From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: jsonb and nested hstore |
Date: | 2014-03-14 11:52:37 |
Message-ID: | CAF4Au4xwqyF78VB=zaj4Et8HiZE8AxTp7SE_Ogvs-t-G1G8p3Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
VODKA index will have no lenght limitation.
On Fri, Mar 14, 2014 at 3:07 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 13 Březen 2014, 23:39, Peter Geoghegan wrote:
>> On Thu, Mar 13, 2014 at 2:21 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>>> It does sound like the main question here is which opclass should be
>>> the default. From the discussion there's a jsonb_hash_ops which works
>>> on all input values but supports fewer operators and a jsonb_ops which
>>> supports more operators but can't handle json with larger individual
>>> elements. Perhaps it's better to make jsonb_hash_ops the default so at
>>> least it's always safe to create a default gin index?
>>
>> Personally, I don't think it's a good idea to change the default. I
>> have yet to be convinced that if you hit the GIN limitation it's an
>> indication of anything other than that you need to reconsider your
>> indexing choices (how often have we heard that complaint of GIN before
>> in practice?). Even if you don't hit the limitation directly, with
>
> I've never used GIN with anything else than values that built-in full-text
> (tsvector), pg_trgm or points, and I suspect that's the case with most
> other users. All those types have "naturally limited" size (e.g. words
> tend to have very limited length, unless you're Maori, but even there the
> longest name is just 85 characters [1]).
>
> The only place in (core|contrib) where I'd expect this kind of issues is
> probably intarray, but it's arguably less frequently used than
> tsvector/pg_trgm for example.
>
> So ISTM this is the main reason why we don't see more complaints about the
> GIN size limit. I expect that to change with json + "index all" approach.
>
>> something like jsonb_hash_ops you're still hashing a large nested
>> structure, very probably uselessly. Are you really going to look for
>> an exact match to an elaborate nested structure? I would think,
>> probably not.
>
> What I find (very) useful is queries that look like this:
>
> SELECT if FROM json_table WHERE json_value @> '{"a" : {"b" : {"c" : 3}}}';
>
> or (without the @> operator) like this:
>
> SELECT if FROM json_table WHERE json_value #>> ARRAY['a', 'b', 'c'] = '3';
>
> or something like that ...
>
>> Now, as Alexander says, there might be a role for another
>> (jsonb_hash_ops) opclass that separately indexes values only. I still
>> think that by far the simplest solution is to use expressional
>> indexes, because we index key values and array element values
>> indifferently. Of course, nothing we have here precludes the
>> development of such an opclass.
>
> Maybe. I don't have much insight into ho GIN works / what is possible. But
> I think we should avoid having large number of opclasses, each supporting
> a small fraction of use cases. If we could keep the two we have right now,
> that'd be nice.
>
> regards
> Tomas
>
> [1] http://en.wikipedia.org/wiki/List_of_long_place_names
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2014-03-14 12:12:18 | Re: plpgsql.warn_shadow |
Previous Message | Heikki Linnakangas | 2014-03-14 11:50:58 | Re: Archive recovery won't be completed on some situation. |