Re: JSONB index not in use, but is TOAST the real cause of slow query?

From: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: JSONB index not in use, but is TOAST the real cause of slow query?
Date: 2022-05-29 16:03:20
Message-ID: CAHAc2jf04e2kPXbYcNHgcb_K7=PBG-uexk9P8L=e_zv2va9+Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, 29 May 2022, 15:58 Tom Lane, <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Shaheed Haque <shaheedhaque(at)gmail(dot)com> writes:
> > Unfortunately, the real query which I think should behave very
> > similarly is still at the several-seconds level despite using the
> > index. ...
>
> > -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> > width=4) (actual time=32.488..2258.891 rows=62 loops=1)
> > Recheck Cond: ((company_id = 173) AND ((snapshot ->
> > 'employee'::text) ? '16376'::text))
> > Filter: (((snapshot #>
> > '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> > OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> > OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> > '0'::jsonb))
> > Heap Blocks: exact=5
> > -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> > time=0.038..0.039 rows=0 loops=1)
> > -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> > (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> > loops=1)
> > Index Cond: (company_id = 173)
> > -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> > width=0) (actual time=0.021..0.021 rows=62 loops=1)
> > Index Cond: ((snapshot -> 'employee'::text) ?
> '16376'::text)
>
> > IIUC, at the bottom, the indices are doing their thing, but a couple
> > of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> > cannot quite see why.
>
> I suppose it's the execution of that "Filter" condition, which will
> require perhaps as many as three fetches of the "snapshot" column.
>

Thanks, that's clearly in the frame.

You really need to rethink that data structure. Sure, you can store tons
> of unorganized data in a jsonb column, but you pay for that convenience
> with slow access. Normalizing the bits you need frequently into a more
> traditional relational schema is the route to better-performing queries.
>

Ack. Indeed, the current design works very well for all of the access
patterns other than this one, which only recently came into view as a
problem.

Ahead of contemplating a design change I have been looking at how to
optimise this bit. I'm currently mired in a crash course on SQL syntax as
pertains to JSONB, jsonpath et. al. And the equally mysterious side effects
of "?" and "@>" and so on in terms of the amount of data being fetched etc.
(and all wrapped in a dose of ORM for good measure).

I'll write separately with more specific questions if needed on those
details.

Thanks again for the kind help.

Shaheed

> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alastair McKinley 2022-05-29 16:46:22 Function definition regression in 15beta1 when specific parameter name (string) is used
Previous Message Laurenz Albe 2022-05-29 15:02:28 Re: autovacuum on primary blocking queries on replica?