Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christian Mächler <christian_maechler(at)hotmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Date: 2015-08-04 16:02:15
Message-ID: CAKFQuwbn0nYSQL99rn=WSsfKYrSra5cd3GiQ3iH_rnHHGic1_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 4, 2015 at 8:39 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > As David says, these examples appear to be following what's stated in
> >
> http://www.postgresql.org/docs/9.4/static/functions-matching.html#POSIX-MATCHING-RULES
> > The Spencer regex engine we use has a notion of greediness or
> > non-greediness of the entire regex, and further that that takes
> precedence
> > for determining the overall match length over greediness of individual
> > subexpressions. That behavior might be inconvenient for this particular
> > use-case, but that doesn't make it a bug.
>
> BTW, perhaps it would be worth adding an example to that section that
> shows how to control this behavior. The trick is obvious once you've seen
> it, but not so much otherwise: you add something to the start of the regex
> that establishes the overall greediness you want, but can never actually
> match any characters. "\0*" or "\0*?" will work fine in Postgres
> use-cases since there can never be a NUL character in the data.
>
> regression=# select regexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
> regexp_matches
> -----------------
> {abc0123,4,xyz}
> (1 row)
>
> regression=# select regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
> regexp_matches
> ----------------
> {abc,0,""}
> (1 row)
>
> regression=# select regexp_matches('abc01234xyz', '\0*(.*?)(\d+)(.*)');
> regexp_matches
> -----------------
> {abc,01234,xyz}
> (1 row)
>
>
​+1

David J.​

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message brent_despain 2015-08-04 16:44:54 Re: BUG #13530: sort receives "unexpected out-of-memory situation during sort"
Previous Message David G. Johnston 2015-08-04 15:58:47 Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)