From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Shaozhong SHI <shishaozhong(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Counting the number of repeated phrases in a column |
Date: | 2022-01-26 23:23:26 |
Message-ID: | CAHyXU0x501igQ2x_83wTfsL1pg0e9aRYwr0ScxSR+w95r8CJPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Jan 25, 2022 at 11:10 AM Shaozhong SHI <shishaozhong(at)gmail(dot)com> wrote:
>
> There is a short of a function in the standard Postgres to do the following:
>
> It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases.
>
> For instance:
>
> A cell of value: 'Hello World' means 1 occurrence a phrase.
>
> A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase.
>
> But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'.
>
> 'The City of London, London' also has no occurrences of any repeated phrase.
>
> Anyone has got such a function to check out the number of occurrence of any repeated phrases?
Let's define phase as a sequence of two or more words, delimited by
space. you could find it with something like:
with s as (select 'Hello World Hello World' as sentence)
select
phrase,
array_upper(string_to_array((select sentence from s), phrase), 1) -
1 as occurrances
from
(
select array_to_string(x, ' ') as phrase
from
(
select distinct v[a:b] x
from regexp_split_to_array((select sentence from s), ' ') v
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
) q
) q;
this would be slow for large sentences obviously, and you'd probably
want to prepare the string stripping some characters and such.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Harris | 2022-01-27 00:20:59 | Re: Undetected Deadlock |
Previous Message | Karsten Hilbert | 2022-01-26 22:09:44 | Re: Counting the number of repeated phrases in a column |