From: | "Dmitry Koterov" <dmitry(at)koterov(dot)ru> |
---|---|
To: | "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Cc: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to switch off Snowball stemmer for tsearch2? |
Date: | 2007-08-23 09:56:46 |
Message-ID: | d7df81620708230256m292ae23fk3aeb1c9c9e756c6@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>
> > Now
> >
> > select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
> > select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
> > - it is completely wrong!
> >
> > I have a database with all Russian name, is it possible to use it (how?)
> to
>
> if you have such database why just don't write special dictionary and
> put it in front ?
Of course because this is a database of Russian NAMES, but NOT a database of
surnames.
> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
> > dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
> > needed:
> >
> > function new_lexize($string) {
> > $stem = lexize('ru_ispell_cp1251', $string);
> > if ($stem in names_database) return $string; else return $stem;
> > }
> >
> > Maybe tsearch2 implements this logic already?
>
> sure, it's how text search mapping works.
Could you please detalize?
Of course I can create all word-forms of all Russian names using ispell and
then - subtract this full list from Ispell dictionary (so I will remove
"Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction
algorythm already.
> Dmitry, seems your company could be my client :)
Not now, thank you. Maybe later.
From | Date | Subject | |
---|---|---|---|
Next Message | Max Zorloff | 2007-08-23 10:06:36 | Re: CPU load high |
Previous Message | Thomas Kellerer | 2007-08-23 09:16:46 | Re: reporting tools |