Re: jsonb and nested hstore

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb and nested hstore
Date: 2014-03-12 18:57:33
Message-ID: CAF4Au4wCbersO4n0z6SA3=XxURBpLpv1LfGsCMS5fb6SX0W+rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Also, GiST index is faster for create/update operations. I really hope we will
improve jsonb indexing in the next one-two releases. For now I'd suggest people
index expressional indexes to index just interesting keys or use GiST.

On Wed, Mar 12, 2014 at 5:15 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 12 Březen 2014, 0:41, Peter Geoghegan wrote:
>> On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> ERROR: index row size 1416 exceeds maximum 1352 for index "gin_idx"
>>
>> All index AMs have similar restrictions.
>
> Yes, I know and I have no problem with restrictions in general. You may
> run into similar issues with btree indexes on text columns with long text,
> for example. The thing is that people don't generally index text directly,
> because it usually does not make much sense, but using tsvector etc.
>
> But with jsonb it's more likely because indexing is one of the goodies (at
> least for me). And the discussions with several people interested in
> storing json data I had recently went often like this:
>
> me: It seems we'll have a better json datatype in 9.4.
> them: Nice!
> me: And it will be possible to do searches on arbitrary keys.
> them: Yay!
> me: And we actually got pretty significant improvements in GIN indexes.
> them: Awesome!
> me: But the values you may index need to be less than ~1500B.
> them: Bummer :-(
> me: Well, you can use GIST then.
>
>>> A good example of such header is "dkim-signature" which basically
>>> contains the whole message digitally signed with DKIM. The signature
>>> tends to be long and non-compressible, thanks to the signature.
>>>
>>> I'm wondering what's the best way around this, because I suspect many
>>> new users (especially those attracted by jsonb and GIN improvements)
>>> will run into this. Maybe not immediately, but eventully they'll try to
>>> insert a jsonb with long value, and it will fail ...
>>
>> The jsonb_hash_ops operator class just stores a 32-bit integer hash
>> value (it always sets the recheck flag, which only some of the other
>> default GIN opclass' strategies do). It only supports containment, and
>> not the full variety of operators that the default opclass supports,
>> which is why it isn't the default. I think that in practice the
>> general recommendation will be that when indexing at the "top level",
>> use jsonb_hash_ops. When indexing nested items, use the more flexible
>> default GIN opclass. That seems like a pretty smart trade-off to me.
>
> OK, I'll look into the jsonb_hash_ops - that sounds more or less like what
> I was thinking about (and sure, storing hashes makes some operations
> impossible to support).
>
> The other thing I was thinking about is introducing some kind of upper
> limit for the value length - e.g. index just the first 1kB, or something
> like that. My experience is most values are way shorter, or actually
> differ in the first 1kB, so this should allow most decisions to be made.
> But I'm not really that familiar with how GIN works, so maybe this is
> nonsense.
>
>> The more I think about it, the more inclined I am to lose GiST support
>> entirely for the time being. It lets us throw out about 700 lines of C
>> code, which is a very significant fraction of the total, removes the
>> one open bug, and removes the least understood part of the code. The
>> GiST opclass is not particularly compelling for this.
>
> I disagree with that. I see GiST as a simple fallback option for the cases
> I described. I wasn't able to create a GIN index because of exceeding the
> max item length, but GiST created just fine. It was considerably slower,
> but it worked.
>
> Tomas
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-03-12 19:00:25 Replication slots and footguns
Previous Message Robert Haas 2014-03-12 18:46:01 Re: Patch: show relation and tuple infos of a lock to acquire