Re: querying with index on jsonb slower than standard column. Why?

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: querying with index on jsonb slower than standard column. Why?
Date: 2014-12-08 15:31:21
Message-ID: 5485C449.4020204@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-sql

On 12/07/2014 05:28 PM, Tom Lane wrote:
> Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com> writes:
>> The index created is not a gin index. Its a standard btree index on the
>> data extracted from the json. So the indexes on the standard columns and
>> the ones on the 'fields' extracted from the json seem to be equivalent.
>> But perform differently.
>
> I don't see any particular difference ...
>
> regression=# explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=341613.79..341613.80 rows=1 width=0) (actual time=901.207..901.208 rows=1 loops=1)
> -> Bitmap Heap Scan on json_test (cost=123684.69..338836.02 rows=1111111 width=0) (actual time=497.982..887.128 rows=100690 loops=1)
> Recheck Cond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision))
> Heap Blocks: exact=77578
> -> BitmapAnd (cost=123684.69..123684.69 rows=1111111 width=0) (actual time=476.585..476.585 rows=0 loops=1)
> -> Bitmap Index Scan on idx_data_json_assay2_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=219.287..219.287 rows=999795 loops=1)
> Index Cond: (((data ->> 'assay2_ic50'::text))::double precision < 10::double precision)
> -> Bitmap Index Scan on idx_data_json_assay1_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=208.197..208.197 rows=1000231 loops=1)
> Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)
> Planning time: 0.128 ms
> Execution time: 904.196 ms
> (11 rows)
>
> regression=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=197251.24..197251.25 rows=1 width=0) (actual time=895.238..895.238 rows=1 loops=1)
> -> Bitmap Heap Scan on json_test (cost=36847.25..197003.24 rows=99197 width=0) (actual time=495.427..881.033 rows=100690 loops=1)
> Recheck Cond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision))
> Heap Blocks: exact=77578
> -> BitmapAnd (cost=36847.25..36847.25 rows=99197 width=0) (actual time=474.201..474.201 rows=0 loops=1)
> -> Bitmap Index Scan on idx_data_col_assay2_ic50 (cost=0.00..18203.19 rows=985434 width=0) (actual time=219.060..219.060 rows=999795 loops=1)
> Index Cond: (assay2_ic50 < 10::double precision)
> -> Bitmap Index Scan on idx_data_col_assay1_ic50 (cost=0.00..18594.21 rows=1006637 width=0) (actual time=206.066..206.066 rows=1000231 loops=1)
> Index Cond: (assay1_ic50 > 90::double precision)
> Planning time: 0.129 ms
> Execution time: 898.237 ms
> (11 rows)
>
> regression=# \timing
> Timing is on.
> regression=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
> count
> --------
> 100690
> (1 row)
>
> Time: 882.607 ms
> regression=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
> count
> --------
> 100690
> (1 row)
>
> Time: 881.071 ms
>
> regards, tom lane
>
>

Running the above on my machine I do see the slow down the OP reports. I ran it several times
and it stayed around 3.5x. It might be interesting to get the OS and architecture information
from the OP.

test=# select version();
version
------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4rc1 on i686-pc-linux-gnu, compiled by gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388], 32-bit
(1 row)

test=# \timing
Timing is on.
test=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90
test-# and (data->>'assay2_ic50')::float < 10;
count
-------
99288
(1 row)

Time: 9092.966 ms

test=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
count
-------
99288
(1 row)

Time: 2542.294 ms

explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90
and (data->>'assay2_ic50')::float < 10;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=332209.79..332209.80 rows=1 width=0) (actual time=8980.009..8980.009 rows=1 loops=1)
-> Bitmap Heap Scan on json_test (cost=123684.69..329432.02 rows=1111111 width=0) (actual time=538.688..8960.308 rows=99288 loops=1)
Recheck Cond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision))
Rows Removed by Index Recheck: 7588045
Heap Blocks: exact=20894 lossy=131886
-> BitmapAnd (cost=123684.69..123684.69 rows=1111111 width=0) (actual time=531.066..531.066 rows=0 loops=1)
-> Bitmap Index Scan on idx_data_json_assay2_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=258.717..258.717 rows=998690 loops=1)
Index Cond: (((data ->> 'assay2_ic50'::text))::double precision < 10::double precision)
-> Bitmap Index Scan on idx_data_json_assay1_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=251.664..251.664 rows=997880 loops=1)
Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)
Planning time: 0.391 ms
Execution time: 8980.391 ms
(12 rows)

test=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=196566.38..196566.39 rows=1 width=0) (actual time=2609.545..2609.545 rows=1 loops=1)
-> Bitmap Heap Scan on json_test (cost=37869.00..196304.39 rows=104796 width=0) (actual time=550.273..2590.093 rows=99288 loops=1)
Recheck Cond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision))
Rows Removed by Index Recheck: 7588045
Heap Blocks: exact=20894 lossy=131886
-> BitmapAnd (cost=37869.00..37869.00 rows=104796 width=0) (actual time=542.666..542.666 rows=0 loops=1)
-> Bitmap Index Scan on idx_data_col_assay2_ic50 (cost=0.00..18871.73 rows=1021773 width=0) (actual time=263.959..263.959 rows=998690 loops=1)
Index Cond: (assay2_ic50 < 10::double precision)
-> Bitmap Index Scan on idx_data_col_assay1_ic50 (cost=0.00..18944.62 rows=1025624 width=0) (actual time=257.912..257.912 rows=997880 loops=1)
Index Cond: (assay1_ic50 > 90::double precision)
Planning time: 0.834 ms
Execution time: 2609.960 ms
(12 rows)

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-12-08 15:46:12 Re: querying with index on jsonb slower than standard column. Why?
Previous Message Tom Lane 2014-12-08 01:28:10 Re: querying with index on jsonb slower than standard column. Why?

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2014-12-08 15:46:12 Re: querying with index on jsonb slower than standard column. Why?
Previous Message Tom Lane 2014-12-08 01:28:10 Re: querying with index on jsonb slower than standard column. Why?