From: | Cory Tucker <cory(dot)tucker(at)gmail(dot)com> |
---|---|
To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Grouping By Similarity (using pg_trgm)? |
Date: | 2015-05-14 20:09:15 |
Message-ID: | CAG_=8kDLfVBXjFB_4h5i8Hx-WqVMzL0yG9_8sSOQCVuVA6zuLA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
That produces pretty much the same results as the CROSS JOIN I was using
before. Because each "my_value" in the table are different, if I group on
just their value then I will always have the full result set and a bunch of
essentially duplicated results.
Any other ideas/options?
On Thu, May 14, 2015 at 12:08 PM David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory(dot)tucker(at)gmail(dot)com>
> wrote:
>
>> [pg version 9.3 or 9.4]
>>
>> Suppose I have a simple table:
>>
>> create table data (
>> my_value TEXT NOT NULL
>> );
>> CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops);
>>
>>
>> Now I would like to essentially do group by to get a count of all the
>> values that are sufficiently similar. I can do it using something like a
>> CROSS JOIN to join the table on itself, but then I still am getting all the
>> rows with duplicate counts.
>>
>> Is there a way to do a group by query and only return a single "my_value"
>> column and a count of the number of times other values are similar while
>> also not returning the included similar values in the output, too?
>>
>>
> Concept below - not bothering to lookup the functions/operators for
> pg_trgm:
>
> SELECT my_value_src, count(*)
> FROM (SELECT my_value AS my_value_src FROM data) src
> JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto
> ON ( func(my_value_src, my_value_compareto) < # )
> GROUP BY my_value_src
>
> David J.
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2015-05-14 20:16:31 | Re: Grouping By Similarity (using pg_trgm)? |
Previous Message | David G. Johnston | 2015-05-14 19:08:22 | Re: Grouping By Similarity (using pg_trgm)? |