Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: christian_maechler(at)hotmail(dot)com
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Date: 2015-08-04 15:39:46
Message-ID: 32324.1438702786@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> As David says, these examples appear to be following what's stated in
> http://www.postgresql.org/docs/9.4/static/functions-matching.html#POSIX-MATCHING-RULES
> The Spencer regex engine we use has a notion of greediness or
> non-greediness of the entire regex, and further that that takes precedence
> for determining the overall match length over greediness of individual
> subexpressions. That behavior might be inconvenient for this particular
> use-case, but that doesn't make it a bug.

BTW, perhaps it would be worth adding an example to that section that
shows how to control this behavior. The trick is obvious once you've seen
it, but not so much otherwise: you add something to the start of the regex
that establishes the overall greediness you want, but can never actually
match any characters. "\0*" or "\0*?" will work fine in Postgres
use-cases since there can never be a NUL character in the data.

regression=# select regexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
regexp_matches
-----------------
{abc0123,4,xyz}
(1 row)

regression=# select regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
regexp_matches
----------------
{abc,0,""}
(1 row)

regression=# select regexp_matches('abc01234xyz', '\0*(.*?)(\d+)(.*)');
regexp_matches
-----------------
{abc,01234,xyz}
(1 row)

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Christian Mächler 2015-08-04 15:39:47 Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Previous Message Tom Lane 2015-08-04 14:58:57 Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)