Re: TO_TSVECTOR acts differently with national charcters

From: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Mart Palmas <Mart(dot)Palmas(at)datel(dot)ee>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: TO_TSVECTOR acts differently with national charcters
Date: 2017-08-24 19:01:34
Message-ID: 20170824190134.GA1699@arthur.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 22, 2017 at 08:53:45AM +0000, Mart Palmas wrote:
>
> The string is converted to vector differently, when the string contains national charcters "äöüõžš".
>

I suppose it is true for all non-ascii characters. It could be fixed by
patching the parser of text search. But maybe someone won't be happy
about it, because it can break backward compatibility.

> Results are:
> 'bar' 'foo' 'toop/6'
> '/6' 'bar' 'foo' 'tüüp'

Do you expect first or second option?

Someone may want not devide words by the "/" character, because "toop/6"
can mean a path:

=# select * from ts_debug('simple', 'toop/6');
alias | description | token | dictionaries | dictionary | lexemes
-------+-------------------+--------+--------------+------------+----------
file | File or path name | toop/6 | {simple} | simple | {toop/6}
(1 row)

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-08-24 19:07:28 Re: Standby corruption after master is restarted
Previous Message Masahiko Sawada 2017-08-24 15:27:40 Re: BUG #14788: `pg_restore -c` won't restore schema access privileges.