Re: [E] Re: Regexp_replace bug / does not terminate on long strings

From: "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [E] Re: Regexp_replace bug / does not terminate on long strings
Date: 2021-08-23 09:01:09
Message-ID: CALZg0g44=af4xoXcrWqnje3=sGK8f2P1mNTR9OiFBb1msYgiCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety
of matches. But in fact, this is not what I was trying to use. I was more
looking at "(a)\1*" which shall match exactly what "a+" matches. As
matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go,
just like "a+" does...?!

Regards,
Ingolf

On Fri, Aug 20, 2021 at 6:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com> writes:
> > thank you very much for your reply. Actually, I was assuming all these
> > regular expressions are based on the same core implementation.
>
> They are not. There are at least three fundamentally different
> implementation technologies (DFA, NFA, hybrid). Friedl's "Mastering
> Regular Expressions" cites multiple different programs using each
> of those, every one of which behaves a bit differently when you start
> poking at corner cases. And that's just in the open-source world;
> I don't know what Oracle is using, but I bet it ain't open source.
>
> > I am also surprised that you say the (\1)+ subpattern is computationally
> > expensive. Regular expressions are greedy by default. I.e. in case of a*
> > matching against a string of 1000 a's, the system will not try a, aa,
> aaa,
> > ... and so on, right? Instead, it will consume all the a's in one go.
>
> "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.
>
> regards, tom lane
>

======================================================================

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2021-08-23 09:19:28 Re: Connecton timeout issues and JDBC
Previous Message Kelvin Lau 2021-08-23 07:34:53 Connecton timeout issues and JDBC