jsonb case insensitive search

From: armand pirvu <armand(dot)pirvu(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: jsonb case insensitive search
Date: 2017-05-31 19:18:21
Message-ID: 287C7A09-A660-49AB-ADA8-866121D5156C@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

The goal would be to be able to search case insensitive by any key:value combined with some other columns like outlined below, but initially would be Company:CompuTestSystems

A sample would be

SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

- Table definition
Table "csischema.cfg_files_data"
Column | Type | Modifiers
--------------------+--------------------------+-----------------------------------------------------------------------
file_data_id | bigint | not null default nextval('cfg_files_data_file_data_id_seq'::regclass)
file_id | bigint | not null
show_id | bigint | not null
file_data_record | jsonb | not null
additional_info | jsonb |
file_data_add_by | character varying(100) | not null
file_data_add_date | timestamp with time zone | not null default now()
Indexes:
"cfg_files_data_pkey" PRIMARY KEY, btree (file_data_id, show_id)

- show_id, file_id data distribution

select show_id, file_id, count(*) from cfg_files_data group by show_id, file_id order by 1;
show_id | file_id | count
---------+---------+-------
4 | 9 | 3
4 | 68 | 22
4 | 2 | 6
6 | 3 | 13
6 | 5 | 215
13 | 13 | 13
13 | 8 | 22
21 | 11 | 13
21 | 10 | 22
26 | 12 | 13
30 | 16 | 6
32 | 123 | 53731
35 | 125 | 5
40 | 122 | 53731
46 | 69 | 4
46 | 64 | 4
46 | 67 | 4
46 | 70 | 4

For the example mentioned

SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

create index cfg_files_data_record_idx on cfg_files_data (show_id, file_id, lower(file_data_record::text));

SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 35
AND cfg_files_data.file_id = 125
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using cfg_files_data_record_idx on cfg_files_data (cost=0.54..8.57 rows=1 width=359) (actual time=0.287..0.305 rows=5 loops=1)
Index Cond: ((show_id = 35) AND (file_id = 125))
Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
Planning time: 0.271 ms
Execution time: 0.370 ms
(5 rows)

SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on cfg_files_data (cost=0.00..7276.20 rows=134 width=359) (actual time=194.817..194.848 rows=2 loops=1)
Filter: ((show_id = 32) AND (file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
Rows Removed by Filter: 107829
Planning time: 1.006 ms
Execution time: 194.905 ms

Shall I understand that prior to apply the lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text) filtering, it goes by show_id and field_id and for 32,123 since it is way above 5% it gets into a seq scan ? Although the end result is just 2 rows

SET enable_seqscan = OFF;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on cfg_files_data (cost=2485.18..8544.40 rows=134 width=359) (actual time=173.314..173.317 rows=2 loops=1)
Recheck Cond: (show_id = 32)
Filter: ((file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
Rows Removed by Filter: 53729
Heap Blocks: exact=2437
-> Bitmap Index Scan on cfg_files_data_pkey (cost=0.00..2485.15 rows=53743 width=0) (actual time=12.195..12.195 rows=53731 loops=1)
Index Cond: (show_id = 32)
Planning time: 0.232 ms
Execution time: 173.392 ms
(9 rows)

And that got me to look at

SELECT *
FROM cfg_files_data
WHERE
lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

drop index cfg_files_data_record_idx ;
create index cfg_files_data_record_idx on cfg_files_data (lower(file_data_record::text));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on cfg_files_data (cost=0.00..6737.04 rows=539 width=359) (actual time=194.382..194.468 rows=8 loops=1)
Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
Rows Removed by Filter: 107823
Planning time: 0.421 ms
Execution time: 194.539 ms

Well the end result is 8 rows from a total of 100k+

Not sure why the index is ignored

But is is possible to go for a broader search, aka being able to search by any key:value , efficient and case insensitive ? What am I missing in this picture ?

Thanks for help

Armand

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-05-31 19:32:20 Re: jsonb case insensitive search
Previous Message Magnus Hagander 2017-05-31 18:57:18 Re: Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists