Re: Counting the number of repeated phrases in a column

From: benj(dot)dev(at)laposte(dot)net
To: Shaozhong SHI <shishaozhong(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Counting the number of repeated phrases in a column
Date: 2022-01-25 21:21:05
Message-ID: b1ab1a9f-bcd0-7630-ed2c-4d596bca3e19@laposte.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
> There is a short of a function in the standard Postgres to do the following:
>
> It is easy to count the number of occurrence of words, but it is rather
> difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value:  'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any
> repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2
> occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated phrase.
>
> Anyone has got such a function to check out the number of occurrence of
> any repeated phrases?
>
> Regards,
>
> David

Don't know if it's exactly what you want, but you can replace all
occurence of the phrase in the text by empty string and compute the diff
between the initial and the result and next divide by the length of your
phrase.

Example :
WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu'
, 'toto like' phrase)
SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) /
char_length(phrase) AS nb_occurence
FROM x

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Panchenko 2022-01-25 21:33:03 Re: Counting the number of repeated phrases in a column
Previous Message Ben Chobot 2022-01-25 18:53:24 NIST 800-53v4 scanning?