From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | ishii(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: tsearch2: enable non ascii stop words with C locale |
Date: | 2007-02-13 08:12:56 |
Message-ID: | 45D17308.1070305@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I know. My guess is the parser does not read the stop word file at
> least with default configuration.
Parser should not read stopword file: its deal for dictionaries.
>
> So if a character is not ASCII, it returns 0 even if p_isalpha returns
> 1. Is this what you expect?
No, p_islatin should return true only for latin characters, not for national ones.
>
> In our case, we added JAPANESE_STOP_WORD into english.stop then:
> select to_tsvector(JAPANESE_STOP_WORD)
> which returns words even they are in JAPANESE_STOP_WORD.
> And with the patches the problem was solved.
Pls, show your configuration for lexemes/dictionaries. I suspect that you have
en_stem dictionary on for lword lexemes type. Better way is to use 'simple'
distionary (it's support stopword the same way as en_stem does) and set it for
nlword, word, part_hword, nlpart_hword, hword, nlhword lexeme's types. Note,
leave unchanged en_stem for any latin word.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2007-02-13 08:33:52 | Re: Variable length varlena headers redux |
Previous Message | Niels Breet | 2007-02-13 08:10:40 | Re: OT: IRC nick to real world mapping |