Re: JSONB index not in use, but is TOAST the real cause of slow query?

From: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: JSONB index not in use, but is TOAST the real cause of slow query?
Date: 2022-05-28 20:48:57
Message-ID: CAHAc2jeCoX=Z9Lzd4b21KdzoozRK8qS-zO6-oC5H9e-Cm5b5aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Adrian, Tom, thanks for the input. Based on that, it occurred to me to
create some "dummy" rows and - almost magically - the index kicks in!
Before is 2500ms:

============
Seq Scan on paiyroll_payrun (cost=0.00..52.43 rows=17 width=32)
(actual time=53.127..2567.024 rows=104 loops=1)
Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
Rows Removed by Filter: 1835
Planning Time: 0.060 ms
Execution Time: 2567.044 ms
(5 rows)
============

After is 300ms:

============
Bitmap Heap Scan on paiyroll_payrun (cost=36.11..64.67 rows=14
width=32) (actual time=4.189..311.932 rows=104 loops=1)
Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
Rows Removed by Index Recheck: 1
Heap Blocks: exact=8
-> Bitmap Index Scan on idx1 (cost=0.00..36.10 rows=14 width=0)
(actual time=0.087..0.087 rows=105 loops=1)
Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
Planning Time: 0.167 ms
Execution Time: 311.962 ms
(8 rows)
============

Woot!

Unfortunately, the real query which I think should behave very
similarly is still at the several-seconds level despite using the
index. Before 3600ms:

============
SELECT "paiyroll_payrun"."actual_t"
FROM "paiyroll_payrun"
WHERE ("paiyroll_payrun"."company_id" = 173 AND
("paiyroll_payrun"."snapshot" -> 'employee') ? '16376'
AND NOT (("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','last_run_of_employment']) = 'true'
AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','pay_graph']) = '0'
AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','state','employment','-1','2']) > '0'))
ORDER BY "paiyroll_payrun"."actual_t" DESC
LIMIT 1
============

============
Limit (cost=31.33..31.33 rows=1 width=4) (actual
time=3595.174..3595.176 rows=1 loops=1)
-> Sort (cost=31.33..31.33 rows=3 width=4) (actual
time=3595.174..3595.174 rows=1 loops=1)
Sort Key: actual_t DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3
width=4) (actual time=44.575..3595.082 rows=62 loops=1)
Recheck Cond: (company_id = 173)
Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND
(((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <>
'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[])
<> '0'::jsonb) OR ((snapshot #>
'{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)))
Rows Removed by Filter: 242
Heap Blocks: exact=9
-> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304
loops=1)
Index Cond: (company_id = 173)
Planning Time: 0.258 ms
Execution Time: 3595.195 ms
============

After, with the index in use, 2200ms:

================
Limit (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.989..2258.990 rows=1 loops=1)
-> Sort (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.988..2258.989 rows=1 loops=1)
Sort Key: actual_t DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
width=4) (actual time=32.488..2258.891 rows=62 loops=1)
Recheck Cond: ((company_id = 173) AND ((snapshot ->
'employee'::text) ? '16376'::text))
Filter: (((snapshot #>
'{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
'0'::jsonb))
Heap Blocks: exact=5
-> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
time=0.038..0.039 rows=0 loops=1)
-> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
loops=1)
Index Cond: (company_id = 173)
-> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
width=0) (actual time=0.021..0.021 rows=62 loops=1)
Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)
Planning Time: 0.245 ms
Execution Time: 2259.019 ms
===================

IIUC, at the bottom, the indices are doing their thing, but a couple
of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
cannot quite see why. Have I missed a needed index or what? I'm pretty
much a novice at SQL, so it is quite possible I've missed something
very basic.

Thanks, Shaheed

On Sat, 28 May 2022 at 20:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Shaheed Haque <shaheedhaque(at)gmail(dot)com> writes:
> > One last thought about TOAST. If the cost of the -> retrieving the
> > data cannot be obviated, is there any way to tweak how that works?
>
> The only thing that's readily tweakable is to disable data compression
> for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note
> that you have to rebuild the table for already-stored values to be
> changed). It seems unlikely that that will help you much though,
> since doing that would save CPU at the cost of more disk I/O, and
> it seems that the I/O side is your bottleneck already. It *would*
> help if jsonb had logic for partial fetches, because that'd require
> uncompressed storage to work. But AFAICS that's not there yet.
> I distinctly recall that that was planned for when we were designing
> jsonb's on-disk layout, but I see no code actually doing anything
> of that sort ... the -> operator certainly doesn't.
>
> regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nathan Bossart 2022-05-29 03:30:33 Re: Extension pg_trgm, permissions and pg_dump order
Previous Message Thomas Munro 2022-05-28 20:33:46 Re: Improve configurability for IO related behavoir