On 1/26/22 13:35, Shaozhong SHI wrote:
>
>
> On Tue, 25 Jan 2022 at 17:10, Shaozhong SHI <shishaozhong(at)gmail(dot)com>
> wrote:
>
> There is a short of a function in the standard Postgres to do the
> following:
>
> It is easy to count the number of occurrence of words, but it is
> rather difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value: 'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of
> any repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means
> 2 occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any
> repeated phrase.
>
> Anyone has got such a function to check out the number of
> occurrence of any repeated phrases?
>
> Regards,
>
> David
>
>
> Hi, All Friends,
>
> Whatever. Can we try to build a regex for 'The City of London
> London Great London UK ' ?
>
> It could be something like '[\w\s]+[\s-]+[a-z]+[\s-][\s\w]+'.
> [\s-]+[a-z]+[\s-] is catered for some people think that 'City of
> London' is 'City-of-London' or 'City-of-London'.
>
> Regards,
>
> David
Do you really want "The City of", by itself, to be one of the detected
phrases? eg 'The City of London London Great London UK The City of
Liverpool'.