From: | "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> |
---|---|
To: | "Pgsql General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Tsearch2 and Unicode? |
Date: | 2004-11-18 10:08:38 |
Message-ID: | 2266D0630E43BB4290742247C89105750680F7C4@dozer.computec.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi!
Hi!
Oleg, what exactly do you mean by "tsearch2 doesn't support unicode yet"?
It does seem to work fine in my database, it seems:
./pg_controldata [mycluster] gives me
pg_control version number: 72
[...]
LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8
community_unicode=# SELECT pg_encoding_to_char(encoding) AS encoding FROM pg_database WHERE datname='community_unicode';
encoding
----------
UNICODE
(1 row)
community_unicode=# select to_tsvector('default_german', 'Ich fände, daß das Fehlen von Umlauten ein Ärgernis wäre.');
to_tsvector
------------------------------------------------------------------
'daß':3 'wäre':10 'fehlen':5 'fände':2 'umlauten':7 'Ärgernis':9
(1 row)
community_unicode=# SELECT message_id
community_unicode-# , rank(idxfti, to_tsquery('default_german', 'Könige|Söldner'),0) as rank
community_unicode-# FROM ct_com_board_message
community_unicode-# WHERE idxfti @@ to_tsquery('default_german', 'Könige|Söldner')
community_unicode-# order by rank desc
community_unicode-# limit 10;
message_id | rank
------------+----------
3191632 | 0.686189
2803233 | 0.686189
2935325 | 0.686189
2882337 | 0.686189
2842006 | 0.686189
2854329 | 0.686189
2841962 | 0.686189
2999851 | 0.651322
2869839 | 0.651322
2999799 | 0.61258
(10 rows)
These results look alright to me, so I cannot reproduce this phenomenon of disappearing special characters in a unicode-database. Dawid, are you sure, you INITDB'd your cluster to the correct locale-settings?
Kind regards
Markus
> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org] Im Auftrag von
> Oleg Bartunov
> Gesendet: Mittwoch, 17. November 2004 17:32
> An: Dawid Kuroczko
> Cc: Pgsql General
> Betreff: Re: [GENERAL] Tsearch2 and Unicode?
>
> Dawid,
>
> unfortunately, tsearch2 doesn't support unicode yet.
> If you keep tsvector separately from data than you'll need
> one more join.
>
> Oleg
>
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Richards | 2004-11-18 10:39:12 | A couple serious errors |
Previous Message | frbn | 2004-11-18 09:36:53 | ERROR: Unable to locate type oid 0 in catalog... |