Re: Mailing list search engine: surprising missing results?

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, James Addison <jay(at)jp-hosting(dot)net>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Mailing list search engine: surprising missing results?
Date: 2022-01-25 12:43:48
Message-ID: 22d5245c9c5a9aa05a0510bdd52458812140a870.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Tue, 2022-01-25 at 14:04 +0300, Oleg Bartunov wrote:
> On Mon, Jan 24, 2022 at 11:47 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote:
> > > > The reason is that the 'moore' in 'boyer-moore' is stemmed, since it
> > > > is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool'
> > > > isn't:
> >
> > > Wow, he showed me this problem earlier but I never suspected it was
> > > stemming issue because I never considered proper nowns could be
> > > stem-adjusted, but it is obvious they can.
> >
> > I wonder if we should change that so that components of a compound
> > word are consistently stemmed the same way.
>
> Something like this
>
> SELECT to_tsvector('english', 'Boyer-Moore-Horspool');
>                        to_tsvector
> ----------------------------------------------------------
>  'boyer':2 'boyer-moore-horspool':1 'boyer-moore':1  'moore-horspool':1  'horspool':4 'moor':3
> (1 row)

Not quite. The problem is question is the "'boyer-moore':1".
If that were "'boyer-moor':1" instead, the problem would disappear.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Tom Lane 2022-01-25 16:22:33 Re: Mailing list search engine: surprising missing results?
Previous Message Oleg Bartunov 2022-01-25 11:04:09 Re: Mailing list search engine: surprising missing results?