Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Daniel Verite" <daniel(at)manitou-mail(dot)org>, "Robert Schreiber" <bobschreiber(at)charter(dot)net>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug
Date: 2019-05-12 03:19:23
Message-ID: 87mujsgz14.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>>> What appears to be happening is that the _Q? on the tail_ is being
>>> matched, but _the Q is also being included in the collected data.__

>> You seem to expect that % must be non-greedy and let the final Q?
>> match 1 Q instead of 0, but there doesn't appear to be anything in
>> the doc that supports this interpretation. In fact, it mentions that
>> "%" is comparable to ".*" in POSIX regular expressions, and the
>> latter _is_ greedy.

Tom> Right. You could get the behavior you want using a non-greedy
Tom> quantifier, but you'd have to use the POSIX regexp functions, not
Tom> substring().

I looked up the spec on this point. As far as I can see, we're not
following it, but neither does the spec do what the OP wanted; in fact
the result should have included the _leading_ Q as well as the trailing
one.

The relevant part of SQL2016 seems to be this:

6.32 <string value function>

General Rules

6) If <regular expression substring function> is specified, then:

[...rules that split the pattern into 'R1#"R2#"R3' omitted...]

h) Otherwise, the result S of the <regular expression substring
function> is computed as follows:

i) Let S1 be the shortest initial substring of C such that there is
a substring S23 of C such that the value of the following <search
condition> is True:

'C' = 'S1' || 'S23' AND
'S1' SIMILAR TO 'R1' ESCAPE 'E' AND
'S23' SIMILAR TO '(R2R3)' ESCAPE 'E'

ii) Let S3 be the shortest final substring of S23 such that there is
a substring S2 of S23 such that the value of the following
<search condition> is True:

'S23' = 'S2' || 'S3' AND
'S2' SIMILAR TO 'R2' ESCAPE 'E' AND
'S3' SIMILAR TO 'R3' ESCAPE 'E'

iii) The result of the <regular expression substring function> is S2.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-05-12 03:50:43 Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug
Previous Message Tom Lane 2019-05-11 15:59:21 Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug