Quick Links

Re: pg_trgm

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	peter_e(at)gmx(dot)net
Cc:	ishii(at)sraoss(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, ishii(at)postgresql(dot)org, andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru
Subject:	Re: pg_trgm
Date:	2010-05-27 15:46:19
Message-ID:	20100528.004619.39470103.t-ishii@sraoss.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> I don't know about Japanese, but the locale approach works just fine for
> other agglutinative languages. I would rather suspect that it is the
> trigram approach that might be rather useless for such languages,
> because you are going to get a lot of similarity hits for the affixes.

I'm not sure what you mean by "affixes". But I will explain...

A Japanese sentence consists of words. Problem is, each word is not
separated by space (agglutinative). So most text tools such as text
search need preprocess which finds word boundaries by looking up
dictionaries (and smart grammer analysis routine). In the process
"affixes" can be determined and perhaps removed from the target word
group to be used for text search (note that removing affixes is no
relevant to locale). Once we get space separated sentence, it can be
processed by text search or by pg_trgm just same as Engligh. (Note
that these preprocessing are done outside PostgreSQL world). The
difference is just the "word" can be consists of non ASCII letters.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Re: pg_trgm at 2010-05-27 15:24:36 from Peter Eisentraut

Responses

Re: pg_trgm at 2010-05-27 18:01:01 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2010-05-27 15:50:18	Re: Straightforward Synchronous Replication
Previous Message	Simon Riggs	2010-05-27 15:43:05	Re: quoting and recovery.conf