Re: Mixing greediness in regexp_matches

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixing greediness in regexp_matches
Date: 2019-12-23 15:58:47
Message-ID: ed5241b7-b839-4cec-8c1f-b62245f464d6@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:

> I'd try forcing the match to be the whole string, ie
>
> ^(.*?)(foo|bar|foobar)(.*)$
>
> which would also save some work for restarting the iteration,
> since you'd have already captured the all-the-rest substring.

In that case regexp_matches will return 0 or 1 row. In the
above-mentioned example, that would be:

=> select regexp_matches('the string has foo and foobar and bar and more',
'^(.*?)(foo|foobar|bar)(.*)$', 'g');
regexp_matches
--------------------------------------------------------
{"the string has ",foo," and foobar and bar and more"}

So the next iteration would consist of calling regexp_matches()
on result[3], and so on until no match is found.
I think it would work as desired, but probably much less efficiently on
large strings/large number of matches than if a single call of
regexp_matches() could return all matches.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2019-12-23 16:10:41 Re: Mixing greediness in regexp_matches
Previous Message Tom Lane 2019-12-23 15:26:15 Re: Mixing greediness in regexp_matches