Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)

From: Christian Mächler <christian_maechler(at)hotmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Date: 2015-08-04 15:39:47
Message-ID: DUB128-W56A2C0B3D9D79B7D14F20EF8760@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

You say it is okay that a greedy group suddenly becomes non-greedy if ANOTHER group is made non-greedy?

I've chosen a simple example, but I'm pretty sure I could construct several use-cases which can be solved easily if the regex behaves like in java, javaScript, perl etc. but not with how it is done here. It's clearly not a feature. Already simple things like ending a match with any amount of numbers will become difficult if non-greedy groups are present, e.g. instead of ...([0-9]+) you will have to write ...([0-9]+)(?![0-9]) makes things easier...

Seriously I didn't want to start a debate whether this is right or wrong, because I honestly can't understand how anyone could defend the behavior mentioned in the first sentence of this message. As I said, I just wanted to point out that there is a bug to help improve, but if you prefer it like this it is fine with me, I just think then you probably haven't used regex that much.

Chris

> From: tgl(at)sss(dot)pgh(dot)pa(dot)us
> To: christian_maechler(at)hotmail(dot)com
> CC: david(dot)g(dot)johnston(at)gmail(dot)com; pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
> Date: Tue, 4 Aug 2015 10:58:57 -0400
>
> =?iso-8859-1?B?Q2hyaXN0aWFuIE3kY2hsZXI=?= <christian_maechler(at)hotmail(dot)com> writes:
> > Here some more detailed examples to show why the behavior of the 3rd group is clearly wrong also according to the specification:
>
> What specification are you reading? The POSIX standard for regular
> expressions doesn't mention non-greedy quantifiers at all.
>
> As David says, these examples appear to be following what's stated in
> http://www.postgresql.org/docs/9.4/static/functions-matching.html#POSIX-MATCHING-RULES
> The Spencer regex engine we use has a notion of greediness or
> non-greediness of the entire regex, and further that that takes precedence
> for determining the overall match length over greediness of individual
> subexpressions. That behavior might be inconvenient for this particular
> use-case, but that doesn't make it a bug.
>
> regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2015-08-04 15:58:47 Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Previous Message Tom Lane 2015-08-04 15:39:46 Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)