Mixing greediness in regexp_matches

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Mixing greediness in regexp_matches
Date: 2019-12-23 14:56:47
Message-ID: 306b726b-f185-4668-bffe-ac8e7f78878e@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

When looking into how to implement a global replace of multiple
substrings (each with their own replacement) in sql
or plpgsql, I'm wondering if/how an RE with an alternation
can be used.

The basic idea is to iterate on the rows produced by

regexp_matches(string, '(.*?)(foo|bar|foobar)', 'g')

to break down the string into pairs of (non-matching segment,
matching segment) so that a final result can be assembled
from that (setting aside the last non-matching segment, that
can be retrieved in a final step).

The difficulty is that the longest strings in the alternation
should be prioritized, but the starting (.*?) makes the RE
non-greedy so "foo" is choosen over "foobar".

The doc at [1] leaves me unoptimistic when it mentions that:

"...when an RE contains both greedy and non-greedy
subexpressions, the total match length is either as long as possible
or as short as possible, according to the attribute assigned to the
whole RE. The attributes assigned to the subexpressions only affect
how much of that match they are allowed to “eat” relative to each
other."

Also it gives this example of forcing the RE as a whole to
be greedy despite it having a non-greedy sub-RE:
regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}')
but it doesn't seem to be able to produce the desired result
in the case of the RE in the middle being an alternation
with strings of different lengths.

The ideal RE with a (foo|foobar|bar) alternation, when applied
globally to a string like 'the string has foo and foobar and bar and
more' would produce something like:

{"the string has ","foo"}
{" and ","foobar"}
{" and ","bar"}

Is that possible with regexp_matches?

[1] https://www.postgresql.org/docs/current/functions-matching.html

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-12-23 15:15:40 Re: Mixing greediness in regexp_matches
Previous Message Thomas Kellerer 2019-12-23 14:49:03 Re: SQL operator '*='