Re: contrib/tsearch

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>, <martin_porter(at)softhome(dot)net>
Subject: Re: contrib/tsearch
Date: 2002-09-05 10:46:32
Message-ID: Pine.GSO.4.44.0209051313210.3967-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 5 Sep 2002, Christopher Kings-Lynne wrote:

> Hmmm...thinking about it, maybe 'herring' is being reduced to 'her' after
> the stemming process and hence is thought to be a stopword? This is a bug,
> but how should it be fixed?
>

It's difficult question how to use stop words. We'll see what we could
do. Probably, porter's stemming algorithm has problem here.
'herring' -> 'her'~'ring'
(I have a demo of english-russian stemmr, so you can play)
http://intra.astronet.ru/db/lingua/snowball/
I'll ask Martin Porter if there could be an error stemmer.
But I think the problem is in concept of using stop words.
Should we check for stop words before stemming or after ?
In the first case we have to collect all forms of stop-words which is doable
but difficult to maintain, in latter - we'll have current problem.

It's time for beta1 and I'm not sure if we could work on this issue
right now, but I feel a big pressure from tsearch users :-)
If people want to help us why not to work on stop words list including
all forms ? In any case, we are not native english, so don't expect we'll
create more or less decent list. Programming changes are trivial, probably
we'll end for the moment just using compile time option.
As always, your patches are welcome !

btw, you may test your queries much easier:

list=# select 'herring'::mquery_txt;
ERROR: Your query contained only stopword(s), ignored
list=# select 'herring'::query_txt;
query_txt
-----------
'herring'
(1 row)

> Although, tests don't support that:
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'himring';
> food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'hisring';
> food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'hising';
> food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'himing';
> food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> All work...?
>
> Chris
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Christopher
> > Kings-Lynne
> > Sent: Thursday, 5 September 2002 2:36 PM
> > To: Hackers
> > Subject: [HACKERS] contrib/tsearch
> >
> >
> > Hi Oleg/Teodor,
> >
> > I'm sorry to keep posting bugs without patches, but I'm just
> > hoping you guys
> > know the answer faster than I...I know you're busy.
> >
> > What does tsearch have against the word 'herring' (as in the
> > fish). Why is
> > it considered a stopword?
> >
> > Attached is example queries...
> >
> > Chris
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2002-09-05 11:54:50 7.3 Beta 1 Build Error on Cygwin
Previous Message Jeff Davis 2002-09-05 10:29:43 Re: Inheritance