From: | egocenter <egocenter(at)yandex(dot)ru> |
---|---|
To: | Artur Zakirov <zaartur(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Full text search bug ('russian' regconfig) |
Date: | 2020-02-20 08:21:24 |
Message-ID: | 9210322148.20200220112124@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello, Artur!
Thanks for the answer,
ok, it's strange that only 1 word is affected that way (as if two lexemes exist for 1 word)...
*I use double to_tsvector to eliminate words duplicates.
in the example below ts_title = 'histori':2 'watcom':1,3
and it gives 2 entries in 'город - watcom' via ts_rank_cd
I need to count UNIQUE words entries but it seems to be no luck with std functionality
(I see 2 ways: custom ts_rank function OR to_tsvector / edit tsvector and leave only first position for 'watcom':
ts_title = 'histori':2 'watcom':1).
If you have any idea on that situation, I would highly appreciate it! Thanks in advance)
---------
SELECT
round((ts_rank_cd(ts_title, web_query_or)/0.1)::NUMERIC, 0) AS title_entries_count, -- 2, but should be 1
*
FROM
(SELECT
to_tsvector('russian', 'watcom history | watcom') AS ts_title,
websearch_to_tsquery('russian', REPLACE('город - watcom', '- ' , '')) AS web_query_and, -- тире заменено для отмены его конвертации в отрицание !
REPLACE(websearch_to_tsquery(:reg_config, REPLACE('город - watcom', '- ' , ''))::TEXT, '&', '|')::tsquery AS web_query_or
) AS main;
--
> Hello
> On 2/19/2020 5:35 PM, egocenter wrote:
>> Text search doesn't work correct with the EQUAL string in text and query (russian dictionary config),
>> as you can see in example ts_vector receives different from ts_query lexemes for identical text:
>>
>> tsv = 'дан':1 'магазин':2 'нужн':3 'посеща':4 'точн':5
>> tsq = 'нужн' & 'точн' & 'дан' & 'посещаем' & 'магазин'
> It is because you call to_tsvector() two times. 'russian' is a Snowball
> dictionary and it uses stemming algorithms to cut words ending. Your
> query works if to_tsvector() isn't called twice on the same text:
> =# SELECT
> web_query_and @@ ts_title,
> web_query_and @@ 'зачем нужны точные данные о посещаемости магазинов',
> *
> FROM
> (SELECT
> to_tsvector('russian', 'зачем нужны точные данные о посещаемости
> магазинов') AS ts_title,
> websearch_to_tsquery('russian', 'зачем нужны точные данные о
> посещаемости магазинов?') AS web_query_and
> ) AS main;
> It gives 'true' for the first column.
From | Date | Subject | |
---|---|---|---|
Next Message | hubert depesz lubaczewski | 2020-02-20 11:12:57 | Re: gen_random_uuid() is immutable in Pg 13devel |
Previous Message | Michael Paquier | 2020-02-20 05:31:31 | Re: BUG #16268: SPI_getvalue requires IsTransactionState but TextDatumGetCString of SPI_getbinval - not! |