From: | Teodor Sigaev <teodor(at)stack(dot)net> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>, martin_porter(at)softhome(dot)net |
Subject: | Re: contrib/tsearch |
Date: | 2002-09-09 14:19:42 |
Message-ID: | 3D7CADFE.6070209@stack.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Should we check for stop words before stemming or after ?
Current implementation supports both variants. Look dictionary interface
definition in morph.c:
typedef struct
{
char localename[NAMEDATALEN];
/* init dictionary */
void *(*init) (void);
/* close dictionary */
void (*close) (void *);
/* find in dictionary */
char *(*lemmatize) (void *, char *, int *);
int (*is_stoplemm) (void *, char *, int);
int (*is_stemstoplemm) (void *, char *, int);
} DICT;
'is_stoplemm' method is called before 'lemmtize' and 'is_stemstoplemm' after.
dict/porter_english.dct at the end:
TABLE_DICT_START
"C",
setup_english_stemmer,
closedown_english_stemmer,
engstemming,
NULL,
is_stopengword
TABLE_DICT_END
dict/russian_stemming.dct:
TABLE_DICT_START
"ru_RU.KOI8-R",
NULL,
NULL,
ru_RUKOI8R_stem,
ru_RUKOI8R_is_stopword,
NULL
TABLE_DICT_END
So english stemmer defines is lexem stop or not after stemming, but russian before.
--
Teodor Sigaev
teodor(at)stack(dot)net
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2002-09-09 14:26:20 | Re: Rule updates and PQcmdstatus() issue |
Previous Message | Rod Taylor | 2002-09-09 14:15:56 | Re: 7.3beta1 DROP COLUMN DEPENDENCY PROBLEM |