From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How does the tsearch configuration get selected? |
Date: | 2007-06-15 16:26:37 |
Message-ID: | 4672BDBD.2070500@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy pgsql-hackers |
> One possibility is that the user-visible specification is just a name
> (eg, "english"), but the actual filename out on the filesystem is,
> say, name.encoding.stop (eg, "english.utf8.stop") where we use PG's
> names for the encodings. We could just fail if there's not a file
> matching the database encoding, or we could try that and then try
> utf8, or some other rule. In any case I'd want it to verify and
> convert encoding as necessary while reading.
I have no strong objection for UTF8-encoded files (stop words or ispell or
synonym or thesaurus). Just recode it after reading.
But configuration for different languages might be differ, for example russian
(and any cyrillic-based) configuration is differ from west-european
configuration based on different character sets. So, we should have non-obvious
rules for stemmers to define which exact stemmer and stop-file should be used.
For russian language with utf8 encoding it should use for lword english stemmer,
but for italian language - italian stemmer. Any ASCII chars can't present in
russian word, but might italian word can contains only ASCII.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-06-15 16:40:09 | Re: How does the tsearch configuration get selected? |
Previous Message | Teodor Sigaev | 2007-06-15 16:07:34 | Re: How does the tsearch configuration get selected? |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-06-15 16:40:09 | Re: How does the tsearch configuration get selected? |
Previous Message | David Fetter | 2007-06-15 16:22:42 | Re: Rethinking user-defined-typmod before it's too late |