Re: [E] Regexp_replace bug / does not terminate on long strings

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Miles Elam <miles(dot)elam(at)productops(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: [E] Regexp_replace bug / does not terminate on long strings
Date: 2021-08-20 20:26:37
Message-ID: 060D9E3A-C827-4EDD-AFB7-C70EF5DCD186@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Aug 20, 2021, at 12:51 PM, Miles Elam <miles(dot)elam(at)productops(dot)com> wrote:
>
> Unbounded ranges seem like a problem.

Seems so. The problem appears to be in regcomp.c's repeat() function which handles {1,SOME} differently than {1,INF}

> Seems worth trying a range from 1 to N where you play around with N to find your optimum performance/functionality tradeoff. {1,20} is like '+' but clamps at 20.

For any such value (5, 20, whatever) there can always be a string with more repeated words than the number you've chosen, and the call to regexp_replace won't do what you want. There is also an upper bound at work, because values above 255 will draw a regex compilation error. So it seems worth a bit of work to determine why the regex engine has bad performance in these cases.

It sounds like the OP will be working around this problem by refactoring to call regexp_replace multiple times until all repeats are eradicated, but I don't think such workarounds should be necessary.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Mladen Gogala 2021-08-21 20:08:50 Re: Make bloom extension trusted, but can not drop with normal user
Previous Message Miles Elam 2021-08-20 19:51:56 Re: [E] Regexp_replace bug / does not terminate on long strings