From: | Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Flexible configuration for full-text search |
Date: | 2018-08-29 08:38:31 |
Message-ID: | 20180829153831.6b66d264@asp437-ThinkPad-L380 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 28 Aug 2018 12:40:32 +0700
Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru> wrote:
>On Fri, 24 Aug 2018 18:50:38 +0300
>Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>>Agreed, backward compatibility is important here. Probably we should
>>leave old dictionaries for that. But I just meant that if we
>>introduce new (better) way of stop words handling and encourage users
>>to use it, then it would look strange if default configurations work
>>the old way...
>
>I agree with Alexander. The only drawback I see is that after addition
>of new dictionaries, there will be 3 dictionaries for each language:
>old one, stop-word filter for the language, and stemmer dictionary.
During work on the new version of the patch, I found an issue in
proposed syntax. At the beginning of the conversation, there was a
suggestion to split stop word filtering and words normalization. At this
stage of development, we can use a different dictionary for stop word
detection, but if we drop the word, the word counter wouldn't increase
and the stop word will be processed as an unknown word.
Currently, I see two solutions:
1) Keep the old way of stop word filtering. The drawback of this
approach is the mixing of word normalization and stop word detection
logic inside of a dictionary. It can be solved by the usage of 'simple'
dictionary in accept=false mode as a stop word filter.
2) Add an action STOPWORD to KEEP and DROP (which is not implemented in
previous patch, but I think it is good to have both of them) in the
meaning of "increase word counter but don't add lexeme to vector".
Any suggestions on the issue?
--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2018-08-29 09:01:58 | Re: Reopen logfile on SIGHUP |
Previous Message | Andres Freund | 2018-08-29 08:37:30 | Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes |