Re: index on jsonb col with 2D array inside the json

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: index on jsonb col with 2D array inside the json
Date: 2019-02-19 16:33:54
Message-ID: CA+t6e1=uCAB6AQL7Bw9r+7G6oa5HBNf9t9QPyVoG73v_GezARw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I dont have any indexes on R (the table with the jsonb column). I was
asking if I can create any that can increase this query`s performance.
If I understood you correctly I have 3 options right now :
1)R, without indexes
2)R2 with an index on first and last
3)R3 that should contain a single range column (type int4range) with gist
index on it.

In aspect of performance, R<R2<? R3

‫בתאריך יום ג׳, 19 בפבר׳ 2019 ב-18:28 מאת ‪Michael Lewis‬‏ <‪
mlewis(at)entrata(dot)com‬‏>:‬

> Is your JSON data getting toasted? I wouldn't assume so if it is remaining
> small but something to check. Regardless, if an index exists and isn't
> being used, then that would be the primary concern. You didn't share what
> the definition of the index on R.data is... what do you already have?
>
> You have an array of ranges stored as the value of key "ranges" in jsonb
> field data. If you created a table like R2, but with a single "range"
> column that is int4range type, then I would expect that you could add a
> GiST and then use overlaps &&, or another operator. I would not expect that
> you could index (unnest data->>'ranges' for instance) to get the separated
> out range values.
>
>
>
> *Michael Lewis *
>
>
> On Tue, Feb 19, 2019 at 8:59 AM Mariel Cherkassky <
> mariel(dot)cherkassky(at)gmail(dot)com> wrote:
>
>> Hi,
>> I have a table with json col : R(object int, data jsonb).
>> Example for content :
>> object | data
>> ----------------+---------------------------------------
>> 50 | {"ranges": [[1, 1]]}
>> 51 | {"ranges": [[5, 700],[1,5],[9,10}
>> 52 | {"ranges": [[4, 200],[2,4],[3,4]]}
>> 53 | {"ranges": [[2, 2]]}
>> 54 | {"ranges": [[5, 10]]}
>>
>> Now I tried to query for all the objects that contains a specific range,
>> for example [2,2] :
>> explain analyze SELECT *
>> FROM R d
>> WHERE EXISTS (
>> SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng
>> WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
>> );
>>
>> I saw that the gin index isnt suitable for this type of comparison.
>> However, I saw that the gist index is suitable to handle ranges. Any idea
>> of I can implement a gist index here ?
>>
>> In addition, I saved the same data in relational table
>> R2(object,range_first,range_last).
>> The previous data in this format :
>> object range_first range_last
>> 50 1 1
>> 51 5 700
>> 51 1 5
>> 51 9 10
>>
>> i compared the first query with :
>> explain analyze select * from R2 where range_first <=2 and
>> range_last >= 2; (I have an index on range_first,range_last that is used)
>>
>> The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The
>> question is, Am I missing an index or the jsonb datatype isnt suitable for
>> this structure of data. The R2 table contains 500K records while the R
>> table contains about 200K records.
>>
>>
>>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2019-02-19 16:41:52 Re: index on jsonb col with 2D array inside the json
Previous Message Michael Lewis 2019-02-19 16:28:25 Re: index on jsonb col with 2D array inside the json