From: | Tim Uckun <timuckun(at)gmail(dot)com> |
---|---|
To: | hari(dot)fuchs(at)gmail(dot)com |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: how to group by similarity ? |
Date: | 2012-09-09 09:38:35 |
Message-ID: | CAGuHJrOg2k=B0WNVPfb2huhtcLzMe6vps91MsfK0y17qYmiXqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Apr 25, 2012 at 8:34 PM, <hari(dot)fuchs(at)gmail(dot)com> wrote:
> Andreas <maps(dot)on(at)gmx(dot)net> writes:
>
>> How would I group the table so that it shows groups that have
>> similarity () > x ?
>>
>> Lets say the table looks like this:
>>
>> id, txt
>> 1, aa1
>> 2, bb1
>> 3, cc1
>> 4, bb2
>> 5, bb3
>> 6, aa2
>> ...
>>
>> How would a select look like that shows:
>>
>> id, txt, group_id
>> 1, aa1, 1,
>> 6, aa2, 1,
>> 2, bb1, 2,
>> 4, bb2, 2,
>> 5, bb3, 2,
>> 3, cc1, 3
>
Hey guys. I have a similar problem and I tried a couple of ways to
solve this including the window function described in the answer to
the original poster in this thread.
The problem I am having is that even with a trigam index and a table
with only 80,000 records the query takes forever to run. In both
cases I ended the query manually and have no idea how long it would
actually take to run. I have included the two queries below and am
hoping somebody can give me a pointer on how to accomplish with a
query that runs.
Query 1
WITH grp (t1, id, t2) AS (
SELECT t1.raw_data, t1.id, t2.raw_data
FROM schema.a t1
LEFT JOIN schema.a t2 ON t2.raw_data > t1.raw_data
WHERE t2.raw_data IS NULL OR similarity(t1.raw_data, t2.raw_data) > .75
)
SELECT t1, min(id)
FROM (
SELECT t1, id
FROM grp
UNION ALL
SELECT t2, id
FROM grp
WHERE t2 IS NOT NULL
) dummy
GROUP BY t1
ORDER BY t1
query 2
select similarity(a.raw_data,b.raw_data),*
from schema.a a, schema.a b where similarity(a.raw_data,b.raw_data) >
.75 and a.id != b.id
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2012-09-09 09:43:37 | Re: Index creation takes more time? |
Previous Message | Herouth Maoz | 2012-09-09 09:25:03 | Index creation takes more time? |