From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, James Addison <jay(at)jp-hosting(dot)net>, PostgreSQL WWW <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: Mailing list search engine: surprising missing results? |
Date: | 2022-01-25 16:22:33 |
Message-ID: | 2257661.1643127753@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
> On Tue, 2022-01-25 at 14:04 +0300, Oleg Bartunov wrote:
>> On Mon, Jan 24, 2022 at 11:47 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>>> On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote:
>>>>> The reason is that the 'moore' in 'boyer-moore' is stemmed, since it
>>>>> is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool'
>>>>> isn't:
> Not quite. The problem is question is the "'boyer-moore':1".
> If that were "'boyer-moor':1" instead, the problem would disappear.
Actually, when I try this here, it seems like the stemming *is*
consistent:
regression=# SELECT to_tsvector('english', 'Boyer-Moore-Horspool');
to_tsvector
----------------------------------------------------------
'boyer':2 'boyer-moore-horspool':1 'horspool':4 'moor':3
(1 row)
regression=# SELECT to_tsvector('english', 'Boyer-Moore');
to_tsvector
-----------------------------------
'boyer':2 'boyer-moor':1 'moor':3
(1 row)
If you try variants of that where the first or third term is stemmable,
say
regression=# SELECT to_tsvector('english', 'Boyers-Moore-Horspool');
to_tsvector
-----------------------------------------------------------
'boyer':2 'boyers-moore-horspool':1 'horspool':4 'moor':3
(1 row)
it sure appears that each component word is stemmed independently
already. So I think the original explanation here is wrong and
we need to probe more closely.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Ivan Panchenko | 2022-01-25 17:02:36 | Re: Mailing list search engine: surprising missing results? |
Previous Message | Laurenz Albe | 2022-01-25 12:43:48 | Re: Mailing list search engine: surprising missing results? |