| From: | Mark Steben <mark(dot)steben(at)drivedominion(dot)com> | 
|---|---|
| To: | pgsql-admin <pgsql-admin(at)postgresql(dot)org> | 
| Subject: | using regular expression queries with regular expression indexes | 
| Date: | 2020-06-25 16:45:55 | 
| Message-ID: | CADyzmyykegZ3ZSVm9hYYKirJYGUYYAx_cK1VQYPCz1q+EpQ89g@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-admin | 
Good afternoon,
We are currently running postgres version 12.2. I have a query that uses
regexp code to confirm if a text field is a date or not.  It is resource
intensive and long running.  I have made a regexp index (actually several
flavors of regexp indexes) to aid in this queries performance. However I
cannot seem to 'convince' the planner to use the indexes I create.
 Following is table/index construction for your reference:
 \d customertab
                                                Table "public.customertab"
          Column           |            Type             | Collation |
Nullable |                     Default
---------------------------+-----------------------------+-----------+----------+-------------------------------------------------
 id                        | bigint                      |           | not
null | nextval('customer_attributes_id_seq'::regclass)
 customer_attribute_key_id | integer                     |           |
     |
 attributable_id           | integer                     |           |
     |
 attributable_type         | character varying(255)      |           |
     |
 value                     | character varying(255)      |           |
     |
 created_at                | timestamp without time zone |           |
     |
 updated_at                | timestamp without time zone |           |
     |
 group_num                 | character varying(255)      |           |
     |
 customer_transaction_id   | integer                     |           |
     |
 account_id                | integer                     |           |
     |
Indexes:
    "primets_custattr_pkey" PRIMARY KEY, btree (id)
    "custattr_account_id" btree (account_id)
    "*custattr_value*" btree (value) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL
    "*custattr_value_acctid_keyid*" btree (value, account_id,
customer_attribute_key_id) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL
"*custattr_value_includes*" btree (value) INCLUDE (id, account_id,
group_num) WHERE value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
AND value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
AND account_id IS NOT NULL
The query I am attempting to run:
explain select * FROM customertab ca
                 where ca.value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
                   AND ca.account_id IS NOT NULL
and the explain:
 Bitmap Heap Scan on customertab ca  (cost=9907.40..903793.74 rows=479041
width=95)
   Recheck Cond: (((value)::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text)
AND ((value)::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text)
AND (accou
nt_id IS NOT NULL))
   ->  Bitmap Index Scan on custattr_value  (cost=0.00..9787.64 rows=479041
width=0)
(3 rows)
Finally if I remove the 'ca.account_id is not null' expression
  explain select * FROM customertab ca
                 where ca.value::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
This is the explain:
 Gather  (cost=1000.00..1641792.30 rows=479041 width=95)
   Workers Planned: 2
   ->  Parallel Seq Scan on customertab ca  (cost=0.00..1592888.20
rows=199600 width=95)
         Filter: (((value)::text ~
'^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text)
AND ((value)::text ~
'([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text))
(4 rows)
Is there something else I can try, or am I missing something?  Comments
welcome.
-- 
*Mark Steben*
 Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
  CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com <http://www.autorevenue.com/>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2020-06-25 16:57:49 | Re: using regular expression queries with regular expression indexes | 
| Previous Message | Thomas Kellerer | 2020-06-25 13:58:37 | Re: Performance of SELECT directly and from application |