Re: [E] Regexp_replace bug / does not terminate on long strings

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Markhof, Ingolf" <ingolf(dot)markhof(at)de(dot)verizon(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: [E] Regexp_replace bug / does not terminate on long strings
Date: 2021-08-20 19:32:26
Message-ID: D84B8669-286F-4F90-89D6-6566D93C2C08@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Aug 20, 2021, at 9:52 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.

The following queries take radically different time to run:

\timing
select regexp_replace(
repeat('someone,one,one,one,one,one,one,', 60),
'(?<=^|,)([^,]+)(?:,\1)+(?=$|,)',
'\1', -- replacement
'g' -- apply globally (all matches)
);

Time: 16476.529 ms (00:16.477)

select regexp_replace(
repeat('someone,one,one,one,one,one,one,', 60),
'(?<=^|,)([^,]+)(?:,\1){5}(?=$|,)',
'\1', -- replacement
'g' -- apply globally (all matches)
);

Time: 1.452 ms

The only difference in the patterns is the + vs. the {5}. It looks to me like the first pattern should greedily match five ",one" matches and be forced to stop since ",someone" doesn't match, and the second pattern should grab the five ",one" matches it was told to grab and not try to grab the ",someone", but other than that, they should be performing the same work. I don't see why the performance should be so different.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Miles Elam 2021-08-20 19:51:56 Re: [E] Regexp_replace bug / does not terminate on long strings
Previous Message Tom Lane 2021-08-20 16:52:44 Re: [E] Re: Regexp_replace bug / does not terminate on long strings