From: | Brendan Jurd <direvus(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Our regex vs. POSIX on "longest match" |
Date: | 2012-03-05 09:22:43 |
Message-ID: | CADxJZo1fbE9FA+pW89dNqqiPpLstSxYKug9TLQcS_q+J7wF+_A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5 March 2012 17:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> This is different from what Perl does, but I think Perl's behavior
> here is batty: given a+|a+b+ and the string aaabbb, it picks the first
> branch and matches only aaa.
Yeah, this is sometimes referred to as "ordered alternation",
basically that the branches of the alternation are prioritised in the
same order in which they are described. It is fairly commonplace
among regex implementations.
> apparently, it selects the syntactically first
> branch that can match, regardless of the length of the match, which
> strikes me as nearly pure evil.
As long as it's documented that alternation prioritises in this way, I
don't feel upset about it. At least it still provides you with a
sensible way to get whatever you want from your RE; if you want a
shorter alternative to be preferred, put it up the front. Ordered
alternation also gives you a way to specify which of several
same-length alternatives you would prefer to be matched, which can
come in handy. It also means you can specify less-complex
alternatives before more-complex ones, which can have performance
advantages.
I do agree with you that if you *don't* do ordered alternation, then
it is right to treat alternation as greedy by default.
Cheers,
BJ
From | Date | Subject | |
---|---|---|---|
Next Message | Gregg Jaskiewicz | 2012-03-05 10:06:48 | Re: autovacuum locks |
Previous Message | Shigeru Hanada | 2012-03-05 09:21:19 | Re: pgsql_fdw, FDW for PostgreSQL server |