Re: Counting the number of repeated phrases in a column

From: Shaozhong SHI <shishaozhong(at)gmail(dot)com>
To: Ivan Panchenko <i(dot)panchenko(at)postgrespro(dot)ru>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Counting the number of repeated phrases in a column
Date: 2022-01-26 08:11:52
Message-ID: CA+i5Jwb0w4CeGGwPhauHE9an64A1p1kDkZeSOFXPBtH+TAmO0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko <i(dot)panchenko(at)postgrespro(dot)ru>
wrote:

>
> On 26.01.2022 00:21, benj(dot)dev(at)laposte(dot)net wrote:
> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
> >> There is a short of a function in the standard Postgres to do the
> >> following:
> >>
> >> It is easy to count the number of occurrence of words, but it is
> >> rather difficult to count the number of occurrence of phrases.
> >>
> >> For instance:
> >>
> >> A cell of value: 'Hello World' means 1 occurrence a phrase.
> >>
> >> A cell of value: 'Hello World World Hello' means no occurrence of any
> >> repeated phrase.
> >>
> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
> >> occurrences of 'Hello World'.
> >>
> >> 'The City of London, London' also has no occurrences of any repeated
> >> phrase.
> >>
> >> Anyone has got such a function to check out the number of occurrence
> >> of any repeated phrases?
> >>
> >> Regards,
> >>
> >> David
> >
> > Don't know if it's exactly what you want, but you can replace all
> > occurence of the phrase in the text by empty string and compute the
> > diff between the initial and the result and next divide by the length
> > of your phrase.
> >
> > Example :
> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
> > tutu' , 'toto like' phrase)
> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
> > / char_length(phrase) AS nb_occurence
> > FROM x
> >
> This works if the user knows the phrase. As far as I understood, the
> phrase is not known, and user wants to count number of repeats of any
> phrases.
> Of course this can be done with recursive CTE. Split into words,
> generate all phrases (AFAIK requires recursion), then group and count.
>
> But probably in PL/Perl this could be done more effectively.
>

Is there an example of using recursive CTE to split a text string into
words?

Regards,

David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message benj.dev 2022-01-26 09:52:59 Re: Counting the number of repeated phrases in a column
Previous Message Jian He 2022-01-26 06:05:41 Re: Counting the number of repeated phrases in a column