Re: BUG #17761: Questionable regular expression behavior

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: kosiodg(at)yahoo(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17761: Questionable regular expression behavior
Date: 2023-01-27 12:42:34
Message-ID: Y9PGupnpVoN/uQ2w@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jan 27, 2023 at 09:27:35AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17761
> Logged by: Konstantin Geordzhev
> Email address: kosiodg(at)yahoo(dot)com
> PostgreSQL version: 11.10
> Operating system: tested online
> Description:
>
> Executing:
> select regexp_matches('a 1x1250x2500',
> '(a).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?');
> returns: {a,1,1,NULL}
> while executing:
> select regexp_matches('a 1x1250x2500',
> '(a|b).*?([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?');
> returns: {a,1,1250,2500}
>
> Shouldn't both results be equal?

The problem is, afair, that there is some state in pg's regexp engine
that makes greedy/ungreedy decision once per regexp.

I don't recall details, but my take from back when I learned about it
(years ago) is to try to avoid things like .*?

Instead you can:

#v+
$ select regexp_matches('a 1x1250x2500', '(a)\D*([1-9]\d*)\s*x\s*([1-9]\d*)(?:\s*x\s*([1-9]\d*))?');
regexp_matches
─────────────────
{a,1,1250,2500}
(1 row)
#v-

depesz

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2023-01-27 14:41:18 Re: BUG #17762: date field casts to null in case section with join's
Previous Message PG Bug reporting form 2023-01-27 12:21:11 BUG #17762: date field casts to null in case section with join's