Re: Tsearch2 and Unicode?

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: "Pgsql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch2 and Unicode?
Date: 2004-11-18 10:08:38
Message-ID: 2266D0630E43BB4290742247C89105750680F7C4@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

Hi!

Oleg, what exactly do you mean by "tsearch2 doesn't support unicode yet"?

It does seem to work fine in my database, it seems:

./pg_controldata [mycluster] gives me
pg_control version number: 72
[...]
LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8

community_unicode=# SELECT pg_encoding_to_char(encoding) AS encoding FROM pg_database WHERE datname='community_unicode';
encoding
----------
UNICODE
(1 row)

community_unicode=# select to_tsvector('default_german', 'Ich fände, daß das Fehlen von Umlauten ein Ärgernis wäre.');
to_tsvector
------------------------------------------------------------------
'daß':3 'wäre':10 'fehlen':5 'fände':2 'umlauten':7 'Ärgernis':9
(1 row)

community_unicode=# SELECT message_id
community_unicode-# , rank(idxfti, to_tsquery('default_german', 'Könige|Söldner'),0) as rank
community_unicode-# FROM ct_com_board_message
community_unicode-# WHERE idxfti @@ to_tsquery('default_german', 'Könige|Söldner')
community_unicode-# order by rank desc
community_unicode-# limit 10;
message_id | rank
------------+----------
3191632 | 0.686189
2803233 | 0.686189
2935325 | 0.686189
2882337 | 0.686189
2842006 | 0.686189
2854329 | 0.686189
2841962 | 0.686189
2999851 | 0.651322
2869839 | 0.651322
2999799 | 0.61258
(10 rows)

These results look alright to me, so I cannot reproduce this phenomenon of disappearing special characters in a unicode-database. Dawid, are you sure, you INITDB'd your cluster to the correct locale-settings?

Kind regards

Markus

> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org] Im Auftrag von
> Oleg Bartunov
> Gesendet: Mittwoch, 17. November 2004 17:32
> An: Dawid Kuroczko
> Cc: Pgsql General
> Betreff: Re: [GENERAL] Tsearch2 and Unicode?
>
> Dawid,
>
> unfortunately, tsearch2 doesn't support unicode yet.
> If you keep tsvector separately from data than you'll need
> one more join.
>
> Oleg
>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Richards 2004-11-18 10:39:12 A couple serious errors
Previous Message frbn 2004-11-18 09:36:53 ERROR: Unable to locate type oid 0 in catalog...