pg_trgm extension and theory

From: alexandros_e <alexandros(dot)ef(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: pg_trgm extension and theory
Date: 2014-02-22 18:11:50
Message-ID: 1393092710590-5793180.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello to all experts,

I am considering of using pg_trgm extension in a research publication, since
initial results seem promising. The index seems to works pretty fast for
finding similar text and significantly accelerate query time. The problem is
that I do not know the theory behind it or the exact method it uses.
My questions:
a) It probably uses the q-grams method (basically 3 grams only). Does it
also create 2 grams and 1 grams to determine similarity?
b) About the index (either gist on gin). Is it based on RD-tree? If not what
is the exact indexing method it uses?
c) Will it work for any UTF8 characters / strings because the documentation
says for ASCII.
d) I also found the http://pgsimilarity.projects.pgfoundry.org/ project who
does similarity functions for string. Does pg_trgm extension have anything
to do with that? Since pgsimilarity seems abandoned is there another project
that a) uses some kind of indexing for similarity b) provides most functions
for string similarity like pgsimilarity?

Thanks

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-trgm-extension-and-theory-tp5793180.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Browse pgsql-general by date

  From Date Subject
Next Message James Harper 2014-02-22 22:52:59 union of types in a different category
Previous Message Torsten Förtsch 2014-02-22 02:21:49 Re: How to continue streaming replication after this error?