From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Shaozhong SHI <shishaozhong(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Match 2 words and more |
Date: | 2021-11-28 00:49:07 |
Message-ID: | 202111280049.bmeoep6puysk@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 2021-Nov-28, Shaozhong SHI wrote:
> this is supposed to find those to have 2 words and more.
>
> select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';
>
> But, it finds only one word as well.
How about something like this?
'^([[:<:]][[:alpha:]]+[[:>:]]( |$)){2}$'
You have:
- the ^ is a constraint that matches start of string
- you have a ( ... ){2}$ construct which means "match exactly twice" and
then match end-of-string
- Inside the parens of that construct, you match:
- [[:<:]] which means start-of-word
- [[:alpha:]]+ which means "a non-empty set of alphabetical chars"
- [[:>:]] which means end-of-word
- ( |$) for "either a space or end-of-string"
You can perhaps simplify by removing the [[:<:]] and [[:>:]]
constraints, so '^([[:alpha:]]+( |$)){2}$'
To mean "between two and four", change the {2} to {2,4}. If you want
"two or more", try {2,}.
You could change the ( |$) to ([[:white:]]+|$) in order to accept more
than one space between words, or combinations of space and tabs and
newlines and so on.
With a decent set of data, you could probably notice some other problems
in this regexp, but at least it should be a decent start.
> It appears that regex is not robust.
Nah.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Guyren Howe | 2021-11-28 00:56:25 | Re: Match 2 words and more |
Previous Message | Rob Sargent | 2021-11-28 00:37:33 | Re: Match 2 words and more |