Re: Text search prefix matching and stop words

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: mnelson(at)binarykeep(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Text search prefix matching and stop words
Date: 2021-10-08 21:06:27
Message-ID: 8755.1633727187@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> writes:
>> Prefix matching should not omit stop words, as matching lexemes may
>> legitimately begin with stop words.

> I am not sure that it is a bug. I think this is a way how to_tsquery
> conversion work: stopwords first then template processing.

I concur with the OP that this is a bug, or at least that it'd be nice
if it worked better. But I'm not sure we can make it better. The basic
design of our text search stuff combined the functions of normalization
and stop-word-suppression into a single dictionary stack, so that it's
impossible to ask for just one of those to happen. But if we skip
applying the dictionaries at all for a prefix item, then word
normalization doesn't happen, which would create a different set of
unexpected-failure-to-match conditions. (So your proposed workaround
of casting directly to tsquery just moves the problem somewhere else.)

I think we could only fix this with a dictionary API change that
allows telling the dictionaries not to suppress stopwords. Not
sure how practical that is. If we'd had the prefix-match feature
from the beginning, maybe it'd have occurred to us that we needed
that API option ... but we didn't.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Francisco Olarte 2021-10-10 07:26:29 Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Previous Message Jeff Davis 2021-10-08 20:55:01 GetSharedSecurityLabel() should be callable before shared relcaches are available