From: | "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: jsonpath |
Date: | 2019-03-17 17:00:29 |
Message-ID: | 3247f0b1-ad4a-2d4e-e9b1-1eedc1e7e81a@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/17/19 12:55 PM, Alexander Korotkov wrote:
>
>> However, when I did something a little more complex, like the below:
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.length ? (@ < 150)';
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "^Start")';
>>
>> SELECT id, jsonb_path_query(data, '$.content')
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>>
>> I would find that the index scan performed as well as the sequential
>> scan. Additionally, on my laptop, the parallel sequential scan would
>> beat the index scan by ~2.5x in some cases.
>
> Yeah, this cases are not supported. Did optimizer automatically
> select sequential scan in this case (if not touching enable_*
> variables)? It should, because optimizer understands that GIN scan
> will be bad if extract_query method failed to extract anything.
It did not - it was doing a bitmap heap scan. I have default costs
setup. Example output from EXPLAIN ANALYZE with the index available:
Aggregate (cost=1539.78..1539.79 rows=1 width=8) (actual
time=270.419..270.419 rows=1 loops=1)
-> Bitmap Heap Scan on news_feed (cost=23.24..1538.73 rows=418
width=0) (actual time=84.040..270.407 rows=5 loops=1)
Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
Rows Removed by Index Recheck: 418360
Heap Blocks: exact=28690
-> Bitmap Index Scan on news_feed_data_gin_idx
(cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788
rows=418365 loops=1)
Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
Planning Time: 0.168 ms
Execution Time: 271.105 ms
And for arguments sake, after I dropped the index (and
max_parallel_workers = 8):
Finalize Aggregate (cost=30998.07..30998.08 rows=1 width=8) (actual
time=91.062..91.062 rows=1 loops=1)
-> Gather (cost=30997.65..30998.06 rows=4 width=8) (actual
time=90.892..97.739 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=29997.65..29997.66 rows=1 width=8)
(actual time=76.977..76.977 rows=1 loops=5)
-> Parallel Seq Scan on news_feed (cost=0.00..29997.39
rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5)
Filter: (data @? '$."length"?(@ < 150)'::jsonpath)
Rows Removed by Filter: 83672
Planning Time: 0.127 ms
Execution Time: 97.801 ms
>> Reading up on what the GIN patch does, this all makes sense: it's
>> optimized for equality, I understand there are challenges to be able to
>> handle inequality, regex exps, etc. And the cases where it really does
>> work well, it's _incredibly_ fast.
>
> Yes, for more complex cases, we need different opclasses. For
> instance, we can consider porting jsquery opclasses to PG 13. And it
> become even more important to get parametrized opclasses, because we
> don't necessary want to index all the json fields in this same way.
> That's another challenge for future releases. But what we have now is
> just support for some of jsonpathes for existing opclasses.
Yeah, that makes sense, and seems to be my recollection from the several
years of presentations I've seen on the topic ;)
>> My suggestion would be adding some additional guidance in the user
>> documentation around how GIN works with the @@ and @? operators so they
>> can understand where GIN will work very well with JSON path + their data
>> and not be surprised when other types of JSON path queries are
>> performing on par with a sequential scan (or worse than a parallel seq
>> scan).
>
> Good point. Will do.
Thanks!
Jonathan
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2019-03-17 17:02:41 | Re: jsonpath |
Previous Message | Alexander Korotkov | 2019-03-17 16:55:20 | Re: jsonpath |