Re: Unable to make use of "deep" JSONB index

From: Erik Rijkers <er(at)xs4all(dot)nl>
To: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Unable to make use of "deep" JSONB index
Date: 2022-06-13 10:41:35
Message-ID: 25a9ac9c-e4ab-6914-f51f-b999ef4acf09@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Op 12-06-2022 om 21:31 schreef Shaheed Haque:

> Thanks Erik. Is the point that the index has to be on the JSON field
> as a whole (i.e. "snapshot") rather than deep inside it (e.g.
> "snapshot.something.further[down]")?
>
> In my case, the snapshot is several MB in size (perhaps 10MB or even
> 20MB), dominated by the snapshot.employee (cardinality 10k, each sized
> as a dict 1-2kB as text). My expectation/guess is that an index of
> "snapshot" will itself be of a size of similar order. However the
> design as-is works very well except for this one case where to speed
> it up, in principle, the index need contain no more than one boolean
> per employee. So that's what I'd like to achieve, if possible.
>
> I've seen no hint in the documentation that creating the index on
> "snapshot.something.further[down]" should not work, and PG certainly
> allows it to be created. Also, Tom has suggested that I should not
> look to some magical ability to infer the use of the index from a
> differently structured query, and I've taken that on board with the
> updated query + index.

If you insist on a btree/integer on pay_graph, the below stuff seems to
work, no?

Perhaps you've missed the difference
between
x != 0 on the one hand,
and
x > 0 on the other,

Here are both queries to show that difference.

CREATE INDEX payrun_btree_paygr_idx ON payrun using btree
(((snapshot->'employee'->'999'->>'pay_graph')::integer));

set enable_seqscan = 0;

SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun
WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer > 0 ;
id | ?column?
----+---------------------------
2 | [["1970-01-01", null, 3]]
(1 row)

Time: 1.384 ms

explain analyze SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer >
0 ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
Index Scan using payrun_btree_paygr_idx on payrun (cost=0.14..8.98
rows=42 width=36) (actual time=0.018..0.020 rows=1 loops=1)
Index Cond: (((((snapshot -> 'employee'::text) -> '999'::text) ->>
'pay_graph'::text))::integer > 0)
Planning Time: 0.047 ms
Execution Time: 0.048 ms
(4 rows)

Time: 0.386 ms
explain analyze SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer
!= 0 ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Seq Scan on payrun (cost=10000000000.00..10000000004.49 rows=126
width=36) (actual time=112.182..112.186 rows=1 loops=1)
Filter: (((((snapshot -> 'employee'::text) -> '999'::text) ->>
'pay_graph'::text))::integer <> 0)
Rows Removed by Filter: 3
Planning Time: 0.050 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 0.636 ms, Inlining 41.761 ms, Optimization 52.033
ms, Emission 18.228 ms, Total 112.658 ms
Execution Time: 153.486 ms
(9 rows)

Time: 153.835 ms

So:
0.386 ms for x > 0 uses payrun_btree_paygr_idx
vs:
153.835 ms for x != 0 uses seq scan

It's hard to see for me where you encounter problems; including the fast
searches in my earier mail, it looks to me like you have many good options.

Erik Rijkers

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Álvaro Herrera 2022-06-13 11:09:48 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Previous Message Kyotaro Horiguchi 2022-06-13 07:07:01 Re: BUG #17504: psql --single-transaction -vON_ERROR_STOP=1 still commits after client-side error