Re: Match 2 words and more

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Shaozhong SHI <shishaozhong(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Match 2 words and more
Date: 2021-11-28 00:49:07
Message-ID: 202111280049.bmeoep6puysk@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2021-Nov-28, Shaozhong SHI wrote:

> this is supposed to find those to have 2 words and more.
>
> select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';
>
> But, it finds only one word as well.

How about something like this?

'^([[:<:]][[:alpha:]]+[[:>:]]( |$)){2}$'

You have:
- the ^ is a constraint that matches start of string
- you have a ( ... ){2}$ construct which means "match exactly twice" and
then match end-of-string
- Inside the parens of that construct, you match:
- [[:<:]] which means start-of-word
- [[:alpha:]]+ which means "a non-empty set of alphabetical chars"
- [[:>:]] which means end-of-word
- ( |$) for "either a space or end-of-string"

You can perhaps simplify by removing the [[:<:]] and [[:>:]]
constraints, so '^([[:alpha:]]+( |$)){2}$'

To mean "between two and four", change the {2} to {2,4}. If you want
"two or more", try {2,}.

You could change the ( |$) to ([[:white:]]+|$) in order to accept more
than one space between words, or combinations of space and tabs and
newlines and so on.

With a decent set of data, you could probably notice some other problems
in this regexp, but at least it should be a decent start.

> It appears that regex is not robust.

Nah.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Guyren Howe 2021-11-28 00:56:25 Re: Match 2 words and more
Previous Message Rob Sargent 2021-11-28 00:37:33 Re: Match 2 words and more