From: | "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: [E] Re: Regexp_replace bug / does not terminate on long strings |
Date: | 2021-08-23 09:01:09 |
Message-ID: | CALZg0g44=af4xoXcrWqnje3=sGK8f2P1mNTR9OiFBb1msYgiCg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety
of matches. But in fact, this is not what I was trying to use. I was more
looking at "(a)\1*" which shall match exactly what "a+" matches. As
matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go,
just like "a+" does...?!
Regards,
Ingolf
On Fri, Aug 20, 2021 at 6:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com> writes:
> > thank you very much for your reply. Actually, I was assuming all these
> > regular expressions are based on the same core implementation.
>
> They are not. There are at least three fundamentally different
> implementation technologies (DFA, NFA, hybrid). Friedl's "Mastering
> Regular Expressions" cites multiple different programs using each
> of those, every one of which behaves a bit differently when you start
> poking at corner cases. And that's just in the open-source world;
> I don't know what Oracle is using, but I bet it ain't open source.
>
> > I am also surprised that you say the (\1)+ subpattern is computationally
> > expensive. Regular expressions are greedy by default. I.e. in case of a*
> > matching against a string of 1000 a's, the system will not try a, aa,
> aaa,
> > ... and so on, right? Instead, it will consume all the a's in one go.
>
> "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.
>
> regards, tom lane
>
======================================================================
Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2021-08-23 09:19:28 | Re: Connecton timeout issues and JDBC |
Previous Message | Kelvin Lau | 2021-08-23 07:34:53 | Connecton timeout issues and JDBC |