Re: Unable to make use of "deep" JSONB index

From: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Unable to make use of "deep" JSONB index
Date: 2022-06-12 09:34:10
Message-ID: CAHAc2jfokgv5GZ-eLrAJ8G52gLqSRy_VR-s0R90uJxui-YSQsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

OK, I have corrected and simplified the test case (including switching
to a btree index). The WHERE clause and the inex now look like this:

...WHERE ((snapshot -> 'employee' -> '999' ->>
'pay_graph')::integer != 0);
...USING btree (((snapshot -> 'employee' -> '$.*' ->>
'pay_graph')::integer != 0));

But the index is still not being used (test case below). I have
confirmed that the equality operator is listed for "search" (I assume
inequality is the same as equality, but I tried both):

# \dAo btree jsonb*
List of operators of operator families
AM | Operator family | Operator | Strategy | Purpose
-------+-----------------+-----------------+----------+---------
btree | jsonb_ops | <(jsonb,jsonb) | 1 | search
btree | jsonb_ops | <=(jsonb,jsonb) | 2 | search
btree | jsonb_ops | =(jsonb,jsonb) | 3 | search
btree | jsonb_ops | >=(jsonb,jsonb) | 4 | search
btree | jsonb_ops | >(jsonb,jsonb) | 5 | search
(5 rows)

If this is not a bug, then how should the query or the index be
changed to make this work?

=== begin test case ===

CREATE TABLE payrun (
id serial primary key,
snapshot JSONB
);

INSERT INTO payrun(snapshot)
VALUES
('{"employee": {"999": {"id": 999, "state": {"employment":
[["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
('{"employee": {"999": {"id": 999, "state": {"employment":
[["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
('{"employee": {"998": {"id": 998, "state": {"employment":
[["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
;

SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
0);

--
-- Create index designed to match the query.
--
CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
'$.*' ->> 'pay_graph')::integer != 0));

set enable_seqscan = OFF;

--
-- EXPLAIN ANALYZE ...query above...
--
explain analyze SELECT id,snapshot #>'{employee,999,state,employment}'
FROM "payrun" WHERE ((snapshot -> 'employee' -> '999' ->>
'pay_graph')::integer != 0);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on payrun (cost=10000000000.00..10000000001.08 rows=2
width=36) (actual time=70.051..70.052 rows=1 loops=1)
Filter: (((((snapshot -> 'employee'::text) -> '999'::text) ->>
'pay_graph'::text))::integer <> 0)
Rows Removed by Filter: 2
Planning Time: 0.147 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.354 ms, Inlining 8.305 ms, Optimization 49.237
ms, Emission 12.499 ms, Total 70.395 ms
Execution Time: 70.428 ms
(9 rows)

=== end test case ===

Thanks, Shaheed

On Thu, 2 Jun 2022 at 16:51, Shaheed Haque <shaheedhaque(at)gmail(dot)com> wrote:
>
> On Thu, 2 Jun 2022 at 15:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Shaheed Haque <shaheedhaque(at)gmail(dot)com> writes:
> > > -- Create index designed to match the query.
> > > --
> > > create index idx1 on payrun using gin ((snapshot->'$.employee.* ?
> > > (@.pay_graph <> 0 || @.last_run_of_employment == true ||
> > > @.state.employment[last][2] == 0)'));
> >
> > But that doesn't match the query; it's not even the same topmost
> > operator:
> >
> > > explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM
> > > "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 ||
> > > @.last_run_of_employment
> > > == true || @.state.employment[last][2] == 0)');
>
> I assume you are referring to the difference between "snapshot @?" and
> "snapshot ->"? If so, apologies: too much cutting and pasting from too
> many experiments. I did in fact also try the "using btree ((snapshot
> @?" form but it gave the same results.
>
> > In general you seem to have much too high an opinion of what PG's
> > index machinery can cope with. The general pattern is that it can
> > use a query WHERE clause with an index if the clause is of the form
> > "indexed-column indexable-operator constant". There's a small number
> > of special cases where it can transform things that don't initially
> > look like that into the right form, but AFAIR we don't have any
> > such special cases for any json-related operators.
>
> LOL. I'm pretty much a noob here, so that's very possible.
>
> > The one saving grace is that "indexed-column" can be an expression
> > appearing in an index, so in some cases you can finesse things
> > that way. But you won't find any deep knowledge of jsonpath
> > expressions in there.
>
> I was basing my efforts on this statement in the docs
> https://www.postgresql.org/docs/14/datatype-json.html#JSON-INDEXING:
>
> GIN index extracts statements of following form out of jsonpath:
> accessors_chain = const. Accessors chain may consist of .key, [*], and
> [index] accessors. jsonb_ops additionally supports .* and .**
> accessors.
>
> Did I mis-implement, misunderstand or read too much into this?
>
> > Having said that, @? is reported as an indexable operator in v14:
> >
> > regression=# \dAo gin jsonb*
> > List of operators of operator families
> > AM | Operator family | Operator | Strategy | Purpose
> > -----+-----------------+--------------------+----------+---------
> > gin | jsonb_ops | @>(jsonb,jsonb) | 7 | search
> > gin | jsonb_ops | @?(jsonb,jsonpath) | 15 | search
> > gin | jsonb_ops | @@(jsonb,jsonpath) | 16 | search
> > gin | jsonb_ops | ?(jsonb,text) | 9 | search
> > gin | jsonb_ops | ?|(jsonb,text[]) | 10 | search
> > gin | jsonb_ops | ?&(jsonb,text[]) | 11 | search
> > gin | jsonb_path_ops | @>(jsonb,jsonb) | 7 | search
> > gin | jsonb_path_ops | @?(jsonb,jsonpath) | 15 | search
> > gin | jsonb_path_ops | @@(jsonb,jsonpath) | 16 | search
> > (9 rows)
> >
> > so it seems like you ought to get some benefit for this query
> > from just a plain GIN index on "snapshot".
>
> Interesting. I'm pretty sure I started there a few days ago without
> any luck but I'll give it another spin (having learnt quite a bit
> since then).
>
> >
> > regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Erik Rijkers 2022-06-12 10:53:52 Re: Unable to make use of "deep" JSONB index
Previous Message Michael Paquier 2022-06-11 07:35:22 Re: Re: BUG #17504: psql --single-transaction -vON_ERROR_STOP=1 still commits after client-side error