using regular expression queries with regular expression indexes

From: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: using regular expression queries with regular expression indexes
Date: 2020-06-25 16:45:55
Message-ID: CADyzmyykegZ3ZSVm9hYYKirJYGUYYAx_cK1VQYPCz1q+EpQ89g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Good afternoon,
We are currently running postgres version 12.2. I have a query that uses
regexp code to confirm if a text field is a date or not. It is resource
intensive and long running. I have made a regexp index (actually several
flavors of regexp indexes) to aid in this queries performance. However I
cannot seem to 'convince' the planner to use the indexes I create.
Following is table/index construction for your reference:

\d customertab
Table "public.customertab"
Column | Type | Collation |
Nullable | Default
---------------------------+-----------------------------+-----------+----------+-------------------------------------------------
id | bigint | | not
null | nextval('customer_attributes_id_seq'::regclass)
customer_attribute_key_id | integer | |
|
attributable_id | integer | |
|
attributable_type | character varying(255) | |
|
value | character varying(255) | |
|
created_at | timestamp without time zone | |
|
updated_at | timestamp without time zone | |
|
group_num | character varying(255) | |
|
customer_transaction_id | integer | |
|
account_id | integer | |
|
Indexes:
"primets_custattr_pkey" PRIMARY KEY, btree (id)
"custattr_account_id" btree (account_id)
"*custattr_value*" btree (value) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL

"*custattr_value_acctid_keyid*" btree (value, account_id,
customer_attribute_key_id) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL

"*custattr_value_includes*" btree (value) INCLUDE (id, account_id,
group_num) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL

The query I am attempting to run:

explain select * FROM customertab ca
where ca.value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND ca.value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND ca.account_id IS NOT NULL

and the explain:
Bitmap Heap Scan on customertab ca (cost=9907.40..903793.74 rows=479041
width=95)
Recheck Cond: (((value)::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text)
AND ((value)::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text)
AND (accou
nt_id IS NOT NULL))
-> Bitmap Index Scan on custattr_value (cost=0.00..9787.64 rows=479041
width=0)
(3 rows)

Finally if I remove the 'ca.account_id is not null' expression

explain select * FROM customertab ca
where ca.value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND ca.value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text

This is the explain:
Gather (cost=1000.00..1641792.30 rows=479041 width=95)
Workers Planned: 2
-> Parallel Seq Scan on customertab ca (cost=0.00..1592888.20
rows=199600 width=95)
Filter: (((value)::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text)
AND ((value)::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text))
(4 rows)

Is there something else I can try, or am I missing something? Comments
welcome.

--
*Mark Steben*
Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com <http://www.autorevenue.com/>

<http://autobasedigital.net/marketing/DD12_sig.jpg>

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2020-06-25 16:57:49 Re: using regular expression queries with regular expression indexes
Previous Message Thomas Kellerer 2020-06-25 13:58:37 Re: Performance of SELECT directly and from application