From: | Beena Emerson <memissemerson(at)gmail(dot)com> |
---|---|
To: | Janek Sendrowski <janek12(at)web(dot)de> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Fastest Index/Algorithm to find similar sentences |
Date: | 2013-07-31 13:56:22 |
Message-ID: | CAOG9ApEaGjHaFtm2XrVGYc6WbYFva3JzLxa6ANSFFyW_-mFkQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I am sorry, I just re-read your mail and realized you have already tried
with pg_trgm.
On Wed, Jul 31, 2013 at 7:23 PM, Beena Emerson <memissemerson(at)gmail(dot)com>wrote:
> On Sat, Jul 27, 2013 at 10:34 PM, Janek Sendrowski <janek12(at)web(dot)de> wrote:
>
>> Hi Sergey Konoplev,
>>
>> If I'm searching for a sentence like "The tiger is the largest cat
>> species" for example.
>>
>> I can only find the sentences, which include the words "tiger, largest,
>> cat, species", but I also like to have the sentences with only three or
>> even two of these words.
>>
>> Janek
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Hi,
>
> You may use similarity functions of pg_trgm.
>
> Example:
> =# \d+ test
> Table "public.test"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+------+-----------+----------+--------------+-------------
> col | text | | extended | |
> Indexes:
> "test_idx" gin (col gin_trgm_ops)
> Has OIDs: no
>
> # SELECT * FROM test;
> col
> -----------------------------------------
> The tiger is the largest cat species
> The cheetah is the fastest cat species
> The peacock is the largest bird species
> (3 rows)
>
> =# SELECT show_limit();
> show_limit
> ------------
> 0.3
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> -----------------------------------------+----------
> The tiger is the largest cat species | 1
> The peacock is the largest bird species | 0.511111
> The cheetah is the fastest cat species | 0.466667
> (3 rows)
>
> =# SELECT set_limit(0.5);
> set_limit
> -----------
> 0.5
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> -----------------------------------------+----------
> The tiger is the largest cat species | 1
> The peacock is the largest bird species | 0.511111
> (2 rows)
>
> =# SELECT set_limit(0.9);
> set_limit
> -----------
> 0.9
> (1 row)
>
> =# SELECT col, similarity(col, 'The tiger is the largest cat species') AS
> sml
> FROM test WHERE col % 'The tiger is the largest cat species'
> ORDER BY sml DESC, col;
> col | sml
> --------------------------------------+-----
> The tiger is the largest cat species | 1
> (1 row)
>
>
> When you set a higher limit, you get more exact matches.
>
>
> --
> Beena Emerson
>
>
--
Beena Emerson
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2013-07-31 14:00:35 | Re: Postgres 9.2.4 for Windows (Vista) Dell Vostro 400, re-installation failure PLEASE CAN SOMEONE HELP!! |
Previous Message | Beena Emerson | 2013-07-31 13:53:35 | Re: Fastest Index/Algorithm to find similar sentences |