From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | sushant354(at)gmail(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: lexemes in prefix search going through dictionary modifications |
Date: | 2011-10-25 17:27:07 |
Message-ID: | 5A1A958A-6F52-4112-A28C-540B6AFBA34A@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Oct25, 2011, at 18:47 , Sushant Sinha wrote:
> On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote:
>> On Oct25, 2011, at 17:26 , Sushant Sinha wrote:
>>> I am currently using the prefix search feature in text search. I find
>>> that the prefix characters are treated the same as a normal lexeme and
>>> passed through stemming and stopword dictionaries. This seems like a bug
>>> to me.
>>
>> Hm, I don't think so. If they don't pass through stopword dictionaries,
>> then queries containing stopwords will fail to find any rows - which is
>> probably not what one would expect.
>
> I think what you are saying a feature is really a bug. I am fairly sure
> that when someone says to_tsquery('english', 's:*') one is looking for
> an entry that has a *non-stopword* word that starts with 's'. And
> specially so in a text search configuration that eliminates stop words.
But the whole idea of removing stopwords from the query is that users
*don't* need to be aware of the precise list of stopwords. The way I see
it, stopwords are simply an optimization that helps reduce the size of
your fulltext index.
Assume, for example, that the postgres mailing list archive search used
tsearch (which I think it does, but I'm not sure). It'd then probably make
sense to add "postgres" to the list of stopwords, because it's bound to
appear in nearly every mail. But wouldn't you want searched which include
'postgres*' to turn up empty? Quite certainly not.
> Does it even make sense to stem, abbreviate, synonym for a few letters?
> It will be so unpredictable.
That depends on the language. In german (my native tongue), one can
concatenate nouns to form new nouns. It's this not entirely unreasonable
that one would want the prefix to be stemmed to it's singular form before
being matched.
Also, suppose you're using a dictionary which corrects common typos. Who
says you wouldn't want that to be applied to prefix queries?
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Kerem Kat | 2011-10-25 17:49:14 | Re: (PATCH) Adding CORRESPONDING to Set Operations |
Previous Message | Pavel Stehule | 2011-10-25 16:58:50 | Re: Review: [PL/pgSQL] %TYPE and array declaration - second patch |