Re: Can Postgres beat Oracle for regexp_count?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Shaozhong SHI <shishaozhong(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Can Postgres beat Oracle for regexp_count?
Date: 2022-02-03 05:50:35
Message-ID: CAKFQuwZ8PUnDWC645Oa1hhbT4LqRO2Kc3GuYyxNiuRU3OLhT1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Feb 2, 2022 at 10:26 PM Shaozhong SHI <shishaozhong(at)gmail(dot)com>
wrote:

>
> select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g')
> It is intended to match 'My High Street, but it turned out only 'Street'
> was matched.
>
>
I'm too tired to find the documentation for why you saw your result but
basically you only have a single capturing parentheses pair and since
you've quantified that you end up with just the last capture that was found
- Street. If you want to capture the entire found expression you need to
capture the quantifier. So put parentheses around the entire regexp.

select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 'g')

You now have a two element array, slots filled left-to-right based upon the
opening parenthesis. So {"My High Street",Street}

To get rid of the undesired Street and only return a single element array
you need to make the inner parentheses non-capturing.

select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 'g')

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-02-03 05:59:36 Re: Can Postgres beat Oracle for regexp_count?
Previous Message Shaozhong SHI 2022-02-03 05:26:14 Re: Can Postgres beat Oracle for regexp_count?