Re: Grouping By Similarity (using pg_trgm)?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Cory Tucker <cory(dot)tucker(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Grouping By Similarity (using pg_trgm)?
Date: 2015-05-14 19:08:22
Message-ID: CAKFQuwaoWL3B6sLBAgrKxBrYB1UJLZzWUruSEQUj_QaYApu-nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory(dot)tucker(at)gmail(dot)com> wrote:

> [pg version 9.3 or 9.4]
>
> Suppose I have a simple table:
>
> create table data (
> my_value TEXT NOT NULL
> );
> CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops);
>
>
> Now I would like to essentially do group by to get a count of all the
> values that are sufficiently similar. I can do it using something like a
> CROSS JOIN to join the table on itself, but then I still am getting all the
> rows with duplicate counts.
>
> Is there a way to do a group by query and only return a single "my_value"
> column and a count of the number of times other values are similar while
> also not returning the included similar values in the output, too?
>
>
​Concept below - not bothering to lookup the functions/operators for
pg_trgm:

SELECT my_value_src, count(*)
FROM (SELECT my_value AS my_value_src FROM data) src
JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto
ON ( func(my_value_src, my_value_compareto) < # )
GROUP BY my_value_src

​David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Cory Tucker 2015-05-14 20:09:15 Re: Grouping By Similarity (using pg_trgm)?
Previous Message Cory Tucker 2015-05-14 18:58:57 Grouping By Similarity (using pg_trgm)?