Postgresql8.1.3 tsearch2 with UTF8

From: "Raphael Bolfing" <jackflash(at)gmx(dot)ch>
To: pgsql-admin(at)postgresql(dot)org
Subject: Postgresql8.1.3 tsearch2 with UTF8
Date: 2006-05-10 08:28:59
Message-ID: 6399.1147249739@www051.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

My Task is to update our SuSE8.2 Postgres7.4.1 Webserver with tsearch2 to
the Version SuSE9.3 with Postgres8.1.3 and tsearch2.
The Services are running but i have some Problems with the
tsearch2
Configuration.

-------------------------------------------------------------------------------------------------------------------------------
old System:
SUSE8.2
Postgresql-7.4.1
tsearch2 (guide: References
on
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html
)
In this guide we do the kap. Configuration and Parser

new System:
SuSE9.3
Postgresql-8.1.3
tsearch2 (2 guides: tsearch2
with
UTF-8)
-------------------------------------------------------------------------------------------------------------------------------

My Steps:
1. I've download the new tsearch2.8.2.tar.gz for UTF-8 and replace the
tsearch2 folder
2. install the tsearch2 with make && make install, without problems
3. locale= de_DE.UTF-8,
4. I've download the *.med *.aff *.stop files from sai.msu.su/
tsearch2_german_utf8.zip german ispell dictionary (UTF-8)
extract in /var/lib/ispell/
5. Compiling the German Snowball Stemmer: with stem.c and stem.h (make &&
make install) /dict_de/..
6. After i restored our database with psql -d codasdb -f dump.sql
and psql -d codasdb -f tsearch2.sql
and psql -d codasdb -f dict_de.sql
7. I set the dict_initoption='/var/lib/ispell/german.stop' where dict_name
='de'; ???
8. INSERT INTO pg_ts_cfg (ts_name, prs_name, locale) values
('default_german', 'default', 'de_DE.UTF-8');
INSERT INTO pg_ts_dict (select 'de_ispell',
dict_init,
'DictFile="/var/lib/ispell/german.med",'
'AffFile="/var/lib/ispell/german.aff",'
'StopFile="/var/lib/ispell/german.stop"',
dict_lexize
FROM pg_ts_dict
where dict_name ='ispell_template');
9. SELECT set_curdict('de_ispell'); <- doesn't work with de_ispell i set it
('de'); ???

select 'Our first string used today'::tsvector; <-- runs

Now the Problem is:
codasdb=# select to_tsvector('PostgreSQL ist weitgehend konform mit dem
SQL92/SQL99-Standard, d.h. alle in dem Standard geforderten Funktionen
stehen zur Verfuegung und verhalten sich so, wie vom Standard gefordert;
dies ist bei manchen kommerziellen sowie nichtkommerziellen SQL-Datenbanken
bisweilen nicht gegeben.');
ERROR: invalid UTF-8 byte sequence detected near byte 0xe4

I've testet with
two
guides:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2_german_utf8.html
http://www.tauceti.net/roller/page/cetixx/20060401 (german)

Can
anyone
help?

Raphi

----------------------------------------------------------------------------------------------------------------------------------------------------------
Configuration:

codasdb=# select * from pg_ts_cfg;
ts_name | prs_name | locale
-----------------+----------+--------------
default | default | C
default_russian | default | ru_RU.KOI8-R
utf8_russian | default | ru_RU.UTF-8
simple | default |
default_german | default | de_DE.UTF-8

codasdb=# \l
List of databases
Name | Owner | Encoding
-----------+----------+----------
codasdb | postgres | UTF8
postgres | postgres | UTF8
template0 | postgres | UTF8
template1 | postgres | UTF8

codasdb=# select * from pg_ts_dict;
dict_name | dict_init |
dict_initoption
| dict_lexize |

dict_comment
-----------------+----------------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------+--------------------------------------------------
simple | dex_init(internal) |

| dex_lexize(internal,internal,integer) | Simple example of
dictionary.
en_stem | snb_en_init(internal) | contrib/english.stop

| snb_lexize(internal,internal,integer) | English Stemmer.
Snowball.
ru_stem_koi8 | snb_ru_init_koi8(internal) | contrib/russian.stop

| snb_lexize(internal,internal,integer) | Russian Stemmer.
Snowball. KOI8 Encoding
ru_stem_utf8 | snb_ru_init_utf8(internal) | contrib/russian.stop.utf8

| snb_lexize(internal,internal,integer) | Russian Stemmer.
Snowball. UTF8 Encoding
ispell_template | spell_init(internal) |

| spell_lexize(internal,internal,integer) | ISpell interface. Must
have .dict and .aff files
synonym | syn_init(internal) |

| syn_lexize(internal,internal,integer) | Example of synonym
dictionary
de | dinit_de(internal) | /var/lib/ispell/german.stop

| snb_lexize(internal,internal,integer) | Snowball stemmer for
German
de_ispell | spell_init(internal)
|
DictFile="/var/lib/ispell/german.med",AffFile="/var/lib/ispell/german.aff",StopFile="/var/lib/ispell/german.stop"
| spell_lexize(internal,internal,integer) |
(8 rows)

--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen fr GMX Partner: http://www.gmx.net/de/go/partner

Browse pgsql-admin by date

  From Date Subject
Next Message Uwe C. Schroeder 2006-05-10 08:32:26 Re: Terminating Idle Connections
Previous Message Leo 2006-05-10 08:12:08 unsubscribe