questions about tsearch2 (for czech language)

From: Pavel Stehule <stehule(at)kix(dot)fsv(dot)cvut(dot)cz>
To: pgsql-general(at)postgresql(dot)org
Subject: questions about tsearch2 (for czech language)
Date: 2003-12-22 10:44:36
Message-ID: Pine.LNX.4.44.0312221128350.27697-100000@kix.fsv.cvut.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

I try tsearch2 within czech environment. It is works fine, but I have two
questions.

1. I have words "se", "ve" in my czech stop words. But I get this words in
result. Why? Have I problem with my configuration?

tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.');
ts_name | tok_type | description | token | dict_name | tsvector
---------------+----------+-------------+---------+-------------+-----------
default_czech | lword | Latin word | jmenuji | {cz_ispell} |
'jmenuji'
default_czech | lword | Latin word | se | {cz_ispell} | 'se'
default_czech | lword | Latin word | Pavel | {cz_ispell} | 'pavel'
default_czech | word | Word | Stěhule | {cz_ispell} |
default_czech | lword | Latin word | a | {cz_ispell} |
default_czech | word | Word | bydlím | {cz_ispell} | 'bydlet'
default_czech | lword | Latin word | ve | {cz_ispell} | 've'
default_czech | lword | Latin word | Skalici | {cz_ispell} |
'skalici'
(8 řádek)

tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
ts_name | tok_alias | dict_name
---------------+--------------+-------------
default_czech | email | {simple}
default_czech | file | {simple}
default_czech | float | {simple}
default_czech | host | {simple}
default_czech | hword | {cz_ispell}
default_czech | int | {simple}
default_czech | lhword | {cz_ispell}
default_czech | lpart_hword | {cz_ispell}
default_czech | lword | {cz_ispell}
default_czech | nlhword | {cz_ispell}
default_czech | nlpart_hword | {cz_ispell}
default_czech | nlword | {cz_ispell}
default_czech | part_hword | {simple}
default_czech | sfloat | {simple}
default_czech | uint | {simple}
default_czech | uri | {simple}
default_czech | url | {simple}
default_czech | version | {simple}
default_czech | word | {cz_ispell}
(19 řádek)

2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample Stěhule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucess

tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | Stěhule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydlím | {cz_ispell,simple} |
'bydlet'

Thank You
Pavel Stehule

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2003-12-22 10:50:17 Re: Groff and Weinberg SQL Complete Reference - Sample database?
Previous Message Kris Jurka 2003-12-22 09:57:42 Re: BLOBS : how to remove them totally