Re: Similarity search for sentences

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Janek Sendrowski <janek12(at)web(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Similarity search for sentences
Date: 2013-12-06 15:58:22
Message-ID: 1386345502.6533.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Janek Sendrowski <janek12(at)web(dot)de> wrote:

> I didn't know that the pg_trgm Module provides KNN search

It does, although my own experience shows that it tends to be more
appropriate for name searches or similar smaller columns than for
big text columns.  Using the war_and_peace table from another
thread:

test=# CREATE INDEX war_and_peace_linetext_trgm2 ON war_and_peace
using gist (linetext gist_trgm_ops);
CREATE INDEX
test=# VACUUM ANALYZE war_and_peace;
VACUUM
test=# -- Use a KNN search for the words.
EXPLAIN ANALYZE
SELECT * FROM war_and_peace
  ORDER BY 'ladies gentlemen provinces distance' <-> linetext
  LIMIT 10;
                                                                          QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..2.74 rows=10 width=116) (actual time=42.157..42.246 rows=10 loops=1)
   ->  Index Scan using war_and_peace_linetext_trgm2 on war_and_peace  (cost=0.28..16016.42 rows=65007 width=116) (actual time=42.155..42.243 rows=10 loops=1)
         Order By: (linetext <-> 'ladies gentlemen provinces distance'::text)
 Total runtime: 42.716 ms
(4 rows)

As you can see, it pulls the requested number of rows from the
index in the order of "distance" (inverse similarity).  It is,
however, not as fast as a tsearch2 search (43 ms instead of a
fraction of a ms), and it doesn't give you options to use AND/OR as
you might like.  It is still a lot faster than scanning the whole
table and applying the test to each row.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-12-06 16:19:22 Re: Testing an extension without installing it
Previous Message Florian Weimer 2013-12-06 15:42:36 Testing an extension without installing it