Re: Using POSIX Regular Expressions on xml type fields gives inconsistent results

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Denis Papathanasiou'" <denis(dot)papathanasiou(at)banrai(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using POSIX Regular Expressions on xml type fields gives inconsistent results
Date: 2012-12-21 16:04:03
Message-ID: 009101cddf94$cc940920$65bc1b60$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Denis Papathanasiou
> Sent: Friday, December 21, 2012 10:43 AM
> To: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Using POSIX Regular Expressions on xml type fields
> gives inconsistent results
>
> On 12/21/2012 10:35 AM, David Johnston wrote:
> >>
> >> If you look at the four examples which follow the posix match table
> >> in the docs (http://www.postgresql.org/docs/9.1/static/functions-
> >> matching.html#FUNCTIONS-POSIX-TABLE),
> >> some of them work from the left side, e.g.:
> >>
> >> 'abc' ~ '(b|d)' true
> >>
> >> In my original example, I found I could write this from left to right
> >> like
> > this,
> >> and it would still work:
> >>
> >> '(b|d)' ~ 'abc' true
> >
> > Really???
> >
> > Testing on Windows 9.0.4 this expression returns FALSE, not true as
> > you claim. Please try again and reply with detailed version
> > information and the exact query(s) used if you can get the behavior to
> repeat itself.
>
> => select id from form_d where '(kumar|gonzales)' ~* any(
> CAST(xpath('//PersonsList/PersonName/LastName/text()', data) as TEXT[])
> );
>
> This expression returned (correctly) the information I wanted.
>
> Shouldn't it *not* have worked b/c the '(kumar|gonzales)' regex is on the
> left?

It worked for three reasons:

1) The expression on the right-hand side was a valid regular expression
2) The value on the left-hand side contained a string that happened to
exactly match the expression on the right-hand side
3) The "(kumar|gonzales)" text, while it happens to look like a regular
expression, is just plain text because it appears on the left-hand side of
the operator.

Arguably "regular expression" should have its own data type just like JSON
and XML since RegEx is text-like but with special validation
characteristics. Lacking that the system is unable to help in this
situation. You may have gotten help if someone had a name that did not
resolve to a valid regular expression - in which case the match attempt
would have raised an error.

You said above that '(b|d') ~ 'abc' returned TRUE for you. Did you actually
test that exact (simple) expression or did you immediately jump to your
convoluted example with XML and ANY(array)?

>
> I was hoping to be able to use the xml type field to be able to do '^a'
> type searches on names (e.g. a search for 'Fred' would also match
'Frederick',
> etc.) but since it seems that I cannot, I'll look at different ways of
solving this
> problem.
>

See Tom's suggestion of creating a custom function and operator that
reverses the order of the two text fields.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-12-21 16:08:11 Re: Using POSIX Regular Expressions on xml type fields gives inconsistent results
Previous Message jg 2012-12-21 15:57:40 Re: Coalesce bug ?