From: | "Dmitry Koterov" <dmitry(at)koterov(dot)ru> |
---|---|
To: | "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Cc: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to switch off Snowball stemmer for tsearch2? |
Date: | 2007-08-22 19:21:54 |
Message-ID: | d7df81620708221221h30a575c7m292de73bfa34e6fc@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Suppose I cannot add such synonyms, because:
1. There are a lot of surnames, cannot take care about all of them.
2. After adding a new surname I have to re-calculate all full-text indices,
it costs too much (about 10 days to complete the recalculation).
So, I neet exactly what I ast - switch OFF stem guessing if a word is not in
the dictionary.
On 8/22/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>
> > Hello.
> >
> > We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
> > Now Snowball stemmer is also configured.
> >
> > How to properly switch OFF Snowball stemmer for Russian without turning
> off
> > ispell stemmer? (It is really needed, because "Ivanov" is not the same
> as
> > "Ivan".)
> > Is it enough and correct to simply delete the row from pg_ts_dict or
> not?
> >
> > Here is the dump of pg_ts_dict table:
>
> don't use dump, plain select would be better. In your case, I'd
> suggest to follow standard way - create synonym file like
> ivanov ivanov
> and use it before other dictionaries. Synonym dictionary will recognize
> 'Ivanov' and return 'ivanov'.
>
> >
> > dict_name dict_init dict_initoption dict_lexize dict_comment
> > en_ispell spell_init(internal)
> >
> DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
> > spell_lexize(internal,internal,integer)
> > en_stem snb_en_init(internal) contrib/english.stop
> > snb_lexize(internal,internal,integer) English Stemmer. Snowball.
> > ispell_template spell_init(internal)
> > spell_lexize(internal,internal,integer) ISpell interface. Must have
> .dict
> > and .aff files
> > ru_ispell_cp1251 spell_init(internal)
> >
> DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
> > spell_lexize(internal,internal,integer)
> > ru_stem_cp1251 snb_ru_init_cp1251(internal)
> > contrib/russian.stop.cp1251 snb_lexize(internal,internal,integer)
> > Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
> > ru_stem_koi8 snb_ru_init_koi8(internal) contrib/russian.stop
> > snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. KOI8
> > Encoding
> > ru_stem_utf8 snb_ru_init_utf8(internal) contrib/russian.stop.utf8
> > snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. UTF8
> > Encoding
> >
> simple dex_init(internal) dex_lexize(internal,internal,integer)
> > Simple example of dictionary.
> > synonym syn_init(internal)
> > syn_lexize(internal,internal,integer) Example of synonym dictionary
> > thesaurus_template thesaurus_init(internal)
> > thesaurus_lexize(internal,internal,integer,internal) Thesaurus
> template,
> > must be pointed Dictionary and DictFile
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2007-08-22 19:33:04 | Re: How to switch off Snowball stemmer for tsearch2? |
Previous Message | Oleg Bartunov | 2007-08-22 18:46:59 | Re: How to switch off Snowball stemmer for tsearch2? |