Re: Mixing greediness in regexp_matches

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixing greediness in regexp_matches
Date: 2019-12-23 16:10:41
Message-ID: 90347d36-4b3c-4806-bd99-e5fae2cfad71@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:

> regression=# select regexp_split_to_array('junkfoolbarfoolishfoobarmore',
> 'foo|bar|foobar');
> regexp_split_to_array
> -----------------------
> {junk,l,"",lish,more}
> (1 row)
>
> The idea would be to iterate over the array elements, tracking the
> corresponding position in the source string, and re-discovering at
> each break which of the original alternatives must've matched.
>
> It's sort of annoying that we don't have a simple "regexp_location"
> function that would give you back the starting position of the
> first match.

It occurred to me too that regexp_split_to_table or array would make
this problem really easy if only it had a mode to capture and return the
matched parts too.

FWIW, in plperl, there's a simple solution:

$string =~ s/(foobar|foo|...)/$replace{$1}/g

when %replace is a hash of the substitutions %(foo=>baz,...).
The strings in the alternation are tested in their order of
appearance, so you can choose to be greedy or not by just sorting
them by length.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-12-23 16:16:58 Re: Mixing greediness in regexp_matches
Previous Message Daniel Verite 2019-12-23 15:58:47 Re: Mixing greediness in regexp_matches