Quick Links

Re: Full Text Search combined with Fuzzy

From:	Nicolas Paris <niparisco(at)gmail(dot)com>
To:	Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc:	Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Full Text Search combined with Fuzzy
Date:	2017-03-03 12:49:05
Message-ID:	20170303124904.GB781@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Le 27 févr. 2017 à 10:32, Oleg Bartunov écrivait :
>
>
> On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
>
> Hello,
>
> AFAIK there is no built-in way to combine full text search and fuzzy
> matching
> (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html)
> By example, phrase searching with tipos in it.
>
> First I don't know if postgresql concurrents (lucene based...) are able
> to do so.
>
>
> Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used for
> this.
>

Hi Oleg,

Thanks. I thought pgtrgm was not able to index my long texts because of
limitation of 8191 bytes per index row for btree.

Then I found out it is possible to use pgtrgm over a GIN/GIST index.
My final use case is phrase mining in texts.

I want my application returns texts that contains approximatly the user
entry:

Eg: user search "Hello Word"
a text containing "blah blah blah hello world blah blah blah" would be
returned.

Test:
postgres=# CREATE table test_trgm (texts text);
CREATE TABLE
postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops);
CREATE INDEX
postgres=# SET enable_seqscan = OFF;
SET
postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah blah');
INSERT 0 1
postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah blah');
INSERT 0 1
postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word';
texts | similarity
-------------------------------------------+------------
blah blah blah hello world blah blah blah | 0.473684
blah blah blah hello word blah blah blah | 0.6875
(2 rows)

postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word';
QUERY PLAN
-----------------------------------------------------------------------------------
Bitmap Heap Scan on test_trgm (cost=52.01..56.03 rows=1 width=32)
Recheck Cond: (texts % 'hello word'::text)
-> Bitmap Index Scan on test_trgm_texts_idx (cost=0.00..52.01 rows=1 width=0)
Index Cond: (texts % 'hello word'::text)
(4 rows)

Conclusion: If I d'say 0.4 is my threshold, would this methodology meet
my requirements ?

Thanks for the help !

>
> Second, is such feature is in the road map ?
>
> Third, I wonder if it is a good idea to use the postgresql synonyms
> feature for such prupose.(https://www.postgresql.org/docs/current/static/
> textsearch-dictionaries.html)
> I mean, building up a synonyms dictionnary containing tipos. By eg:
>
> postgres pgsql
> postgresql pgsql
> postgrez pgsql
> postgre pgsql
> gogle googl
> gooogle googl
>
> There is multiple way to build such dictionary. But my question is about
> the implementation of dictionnaries in postgresql: Is postgresql
> supposed to take advantage of billion entries dictionaries ?
>
>
> dictionary is just a program, so it's up to developer how to write efficient
> program to deal with billion entries. Specifically to synonym dictionary, it's
> not intended to work with a lot of entries. btw, have a look on contrib/
> dict_xsyn dictionary, which is more flexible than synonym.
>
>
> Thanks by advance for you answers,
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

In response to

Re: Full Text Search combined with Fuzzy at 2017-02-27 09:32:35 from Oleg Bartunov

Responses

Re: Full Text Search combined with Fuzzy at 2017-03-03 13:08:19 from Artur Zakirov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Geoff Winkless	2017-03-03 12:58:02	Re: ERROR: functions in index expression must be marked IMMUTABLE
Previous Message	Sven R. Kunze	2017-03-03 12:17:54	Re: ERROR: functions in index expression must be marked IMMUTABLE