From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Michael Glaesemann <michael(dot)glaesemann(at)myyearbook(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Pathological regexp match |
Date: | 2010-01-29 04:21:42 |
Message-ID: | 20100129042142.GF1793@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Michael Glaesemann wrote:
> However, as you point out, Postgres doesn't appear to take this into
> account:
>
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*(\2))$r$, $s$X$s$);
> regexp_replace
> ----------------
> oooXooo
> (1 row)
>
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*?(\2))$r$, $s$X$s$);
> regexp_replace
> ----------------
> oooXooo
> (1 row)
I think the reason for this is that the first * is greedy and thus the
entire expression is considered greedy. The fact that you've made the
second * non-greedy does not ungreedify the RE ... Note the docs say:
The above rules associate greediness attributes not only with
individual quantified atoms, but with branches and entire REs
that contain quantified atoms. What that means is that the
matching is done in such a way that the branch, or whole RE,
matches the longest or shortest possible substring as a whole.
It's late here so I'm not sure if this is what you're looking for:
alvherre=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)[^Q]*?A.*(\2))$r$, $s$X$s$);
regexp_replace
----------------
oooXooQooQooo
(1 fila)
(Obviously the non-greediness has moved somewhere else) :-(
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Glaesemann | 2010-01-29 04:36:58 | Re: Pathological regexp match |
Previous Message | Andrew Dunstan | 2010-01-29 04:14:41 | out-of-scope cursor errors |