Re: Regex Query Index question

From: David Johnston <polobo(at)yahoo(dot)com>
To: Naoko Reeves <naokoreeves(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Regex Query Index question
Date: 2011-08-11 23:35:43
Message-ID: 1EBCDA78-0C15-4C12-8B99-232CEDF23CAC@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Aug 11, 2011, at 18:26, Naoko Reeves <naokoreeves(at)gmail(dot)com> wrote:

> Hello,
> I have query phone number in database as follows:
> [123) 456-7890
>
> (123) 456-7890
>
Store phone numbers without formatting...the data is the numbers themselves the formatting is presentation.
> When I query like this:
>
> SELECT * FROM phone
>
> WHERE phone_number ~ ('^\[123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}' || '7890')
>
> it use Index but if I query like this (notice first character is open parenthesis instead of open square blacket ) :
>
> SELECT phn_fk_key FROM phn WHERE
>
> phn_fk_table = 14
>
> AND llx_decrypt(phn_phone_enc) ~ ('^\(123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}' || '7890')
>
> It doesn't use Index....
>
The left side of the two where clauses are different fields/expressions. Since you do not specify what your table and indexes look like your "problem" is impossible to solve but likely has nothing to do with RegEx. Keep in mind, however, that an index can only be used if the pattern is fully anchored. With alternation in the RegEx you want the"^" outside of the part the part that uses "|" otherwise only the first expression ends up being anchored. E.g, '^(a|b)' !='^a|b'. The first one matches a string that stars with a or b whereas the second matches a string that starts with a or contains b anywhere in the string. The second one cannot use the index since it is not guaranteed to be anchored at the start of a string.
> co-worker suggested me to use chr(40) instead so I tried this:
>
> SELECT phn_fk_key FROM phn WHERE
>
> phn_fk_table = 14
>
> AND llx_decrypt(phn_phone_enc) ~ ('^\' || chr(40) || '123' || '[-\s\)]{0,2}' || '456' || '[-\s\)]{0,2}' || '7890')
>
> No success...
>
> Also { and period doesn't seems to use index either.... but } ) [ ] $ # works.Could you guide me to right direction for me please?
>
Particularly with RegEx you want to tell people what you are trying to do and not just give the expressions themselves.

Not testing here but... and ignore whitespace

'^( [ \[\( ]? \s* \d{3} \s* [ -\s\]\) ] \d{3} [ -\s ] \d{4} )$'

The above should match both of your samples and use the index on the regular phone column. If you want to store encrypted and search the unencrypted you have to create a functional index. See documentation for syntax and requirements.

In this case you can replace the \d{n} with your desired search strings.

It would be a lot simpler if you strip out the non-numbers, via functional index if needed, and perform an equality string search. The question becomes, using the example data above, what happens if two people have the same phone number with only the format being different. The answer is the difference between a unique index and a non-unique one...

example: create index name on table (clean_and_decrypt_phone(enc_phone))

Where clean_and_decrypt_phone(enc_phone) = clean_phone( search_string )

This can be done without changing columns, only indexes and queries.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-08-11 23:50:19 Re: COPY from .csv File and Remove Duplicates
Previous Message Rich Shepard 2011-08-11 23:33:25 Re: suggestions for archive_command to a remote standby