Re: How to switch off Snowball stemmer for tsearch2?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Dmitry Koterov <dmitry(at)koterov(dot)ru>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to switch off Snowball stemmer for tsearch2?
Date: 2007-08-23 12:05:27
Message-ID: Pine.LNX.4.64.0708231556590.2727@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 23 Aug 2007, Dmitry Koterov wrote:

>>
>>> Now
>>>
>>> select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
>>> select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
>>> - it is completely wrong!
>>>
>>> I have a database with all Russian name, is it possible to use it (how?)
>> to
>>
>> if you have such database why just don't write special dictionary and
>> put it in front ?
>
>
> Of course because this is a database of Russian NAMES, but NOT a database of
> surnames.
>
>
>> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
>>> dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
>>> needed:
>>>
>>> function new_lexize($string) {
>>> $stem = lexize('ru_ispell_cp1251', $string);
>>> if ($stem in names_database) return $string; else return $stem;
>>> }
>>>
>>> Maybe tsearch2 implements this logic already?

write your own dictionary, which implements any logic you need. In your
case it's just a wrapper around ispell, which will returns original string
not stem. See example
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html
and russian article
http://www.sai.msu.su/~megera/postgres/talks/fts_pgsql_intro.html#ftsdict

>>
>> sure, it's how text search mapping works.
>
>
> Could you please detalize?

you create dictionary surnames_dict and configure
pg_ts_cfgmap to process token of type nlword by
surnames_dict, ru_ispell, ru_stem, for example.

>
> Of course I can create all word-forms of all Russian names using ispell and
> then - subtract this full list from Ispell dictionary (so I will remove
> "Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction
> algorythm already.
>

don't do that ! Just go plain way.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kristo Kaiv 2007-08-23 12:44:04 Re: Converting non-null unique idx to pkey
Previous Message Jeff Amiel 2007-08-23 11:08:26 Re: pg_dump causes postgres crash