Re: contrib/tsearch

From: Teodor Sigaev <teodor(at)stack(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>, martin_porter(at)softhome(dot)net
Subject: Re: contrib/tsearch
Date: 2002-09-09 14:19:42
Message-ID: 3D7CADFE.6070209@stack.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Should we check for stop words before stemming or after ?

Current implementation supports both variants. Look dictionary interface
definition in morph.c:

typedef struct
{
char localename[NAMEDATALEN];
/* init dictionary */
void *(*init) (void);
/* close dictionary */
void (*close) (void *);
/* find in dictionary */
char *(*lemmatize) (void *, char *, int *);
int (*is_stoplemm) (void *, char *, int);
int (*is_stemstoplemm) (void *, char *, int);
} DICT;

'is_stoplemm' method is called before 'lemmtize' and 'is_stemstoplemm' after.
dict/porter_english.dct at the end:
TABLE_DICT_START
"C",
setup_english_stemmer,
closedown_english_stemmer,
engstemming,
NULL,
is_stopengword
TABLE_DICT_END

dict/russian_stemming.dct:
TABLE_DICT_START
"ru_RU.KOI8-R",
NULL,
NULL,
ru_RUKOI8R_stem,
ru_RUKOI8R_is_stopword,
NULL
TABLE_DICT_END

So english stemmer defines is lexem stop or not after stemming, but russian before.

--
Teodor Sigaev
teodor(at)stack(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2002-09-09 14:26:20 Re: Rule updates and PQcmdstatus() issue
Previous Message Rod Taylor 2002-09-09 14:15:56 Re: 7.3beta1 DROP COLUMN DEPENDENCY PROBLEM