From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Dawid Kuroczko <qnex42(at)gmail(dot)com> |
Cc: | Pgsql General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Tsearch2 and Unicode? |
Date: | 2004-11-17 16:31:55 |
Message-ID: | Pine.GSO.4.61.0411171927480.18871@ra.sai.msu.su |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dawid,
unfortunately, tsearch2 doesn't support unicode yet.
If you keep tsvector separately from data than you'll need one more join.
Oleg
On Wed, 17 Nov 2004, Dawid Kuroczko wrote:
> I'm trying to use tsearch2 with database which is in 'UNICODE' encoding.
> It works fine for English text, but as I intend to search Polish texts I did:
>
> insert into pg_ts_cfg('default_polish', 'default', 'pl_PL.UTF-8');
> (and I updated other pg_ts_* tables as written in manual).
>
> However, Polish-specific chars are being eaten alive, it seems.
> I.e. doing select to_tsvector('default_polish', body) from messages;
> results in list of words but with national chars stripped...
>
> I wonder, am I doing something wrong, or just tsearch2 doesn't grok
> Unicode, despite the locales setting? This also is a good question
> regarding ispell_dict and its feelings regarding Unicode, but that's
> another story.
>
> Assuming Unicode unsupported means I should perhaps... oh, convert
> the data to iso8859 prior feeding it to_tsvector()... interesting idea,
> but so far I have failed to actually do it. Maybe store the data as
> 'bytea' and add a column with encoding information (assuming I don't
> want to recreate whole database with new encoding, and that I want
> to use unicode for some columns (so I don't have to keep encoding
> with every text everywhere...).
>
> And while we are at it, how do you feel -- an extra column with tsvector
> and its index -- would it be OK to keep it away from my data (so I can
> safely get rid of them if need be)?
> [ I intend to keep index of around 2 000 000 records, few KBs of
> text each ]...
>
> Regards,
> Dawid Kuroczko
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2004-11-17 16:49:22 | Re: Rules WHERE condition |
Previous Message | Oleg Bartunov | 2004-11-17 16:26:57 | Re: TSearch2: Problems with compound words and stop words |