Re: contrib/tsearch

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Hackers" <pgsql-hackers(at)postgresql(dot)org>, <martin_porter(at)softhome(dot)net>
Subject: Re: contrib/tsearch
Date: 2002-09-06 04:01:30
Message-ID: GNELIHDDFBOCMGBFGEFOGEBLCEAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Should we check for stop words before stemming or after ?

I think you should.

> In the first case we have to collect all forms of stop-words
> which is doable
> but difficult to maintain, in latter - we'll have current problem.

Looking at the list of stopwords you sent me, Oleg, there are only about 1
out of the list of 120 stopwords that need to have all word forms added. I
also don't think it'll be a maintenance problem. The reason I think this is
because stopwords in general don't have different word forms.

eg. her, his, i, and, etc. They don't have different forms. In fact, the
_only_ word in the stopword list that needs a different form is yourself and
yourselves. Actually, according to dictionary.com 'ourself' is also a word.
'themself' isn't tho. Some others I don't know about are:

'veri' - I assume this is stemmed 'very', so why not just use 'very'?

So, why don't you change tsearch to check for stop words _before_ stemming?
I can give you a list of revised stopwords that haven't been stemmed, with
all forms of the words.

> It's time for beta1 and I'm not sure if we could work on this issue
> right now, but I feel a big pressure from tsearch users :-)
> If people want to help us why not to work on stop words list including
> all forms ? In any case, we are not native english, so don't expect we'll
> create more or less decent list. Programming changes are trivial, probably
> we'll end for the moment just using compile time option.
> As always, your patches are welcome !

I'm happy to work on the list of stopwords for you, Oleg. I agree this
might be 7.4 thing though...

Chris

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-09-06 04:20:11 Re: contrib/tsearch
Previous Message Rod Taylor 2002-09-06 02:55:13 Re: Ok, I broke down...