From: | Pavel Stehule <stehule(at)kix(dot)fsv(dot)cvut(dot)cz> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | questions about tsearch2 (for czech language) |
Date: | 2003-12-22 10:44:36 |
Message-ID: | Pine.LNX.4.44.0312221128350.27697-100000@kix.fsv.cvut.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello
I try tsearch2 within czech environment. It is works fine, but I have two
questions.
1. I have words "se", "ve" in my czech stop words. But I get this words in
result. Why? Have I problem with my configuration?
tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.');
ts_name | tok_type | description | token | dict_name | tsvector
---------------+----------+-------------+---------+-------------+-----------
default_czech | lword | Latin word | jmenuji | {cz_ispell} |
'jmenuji'
default_czech | lword | Latin word | se | {cz_ispell} | 'se'
default_czech | lword | Latin word | Pavel | {cz_ispell} | 'pavel'
default_czech | word | Word | Stěhule | {cz_ispell} |
default_czech | lword | Latin word | a | {cz_ispell} |
default_czech | word | Word | bydlím | {cz_ispell} | 'bydlet'
default_czech | lword | Latin word | ve | {cz_ispell} | 've'
default_czech | lword | Latin word | Skalici | {cz_ispell} |
'skalici'
(8 řádek)
tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
ts_name | tok_alias | dict_name
---------------+--------------+-------------
default_czech | email | {simple}
default_czech | file | {simple}
default_czech | float | {simple}
default_czech | host | {simple}
default_czech | hword | {cz_ispell}
default_czech | int | {simple}
default_czech | lhword | {cz_ispell}
default_czech | lpart_hword | {cz_ispell}
default_czech | lword | {cz_ispell}
default_czech | nlhword | {cz_ispell}
default_czech | nlpart_hword | {cz_ispell}
default_czech | nlword | {cz_ispell}
default_czech | part_hword | {simple}
default_czech | sfloat | {simple}
default_czech | uint | {simple}
default_czech | uri | {simple}
default_czech | url | {simple}
default_czech | version | {simple}
default_czech | word | {cz_ispell}
(19 řádek)
2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample Stěhule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucess
tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.'); ts_name | tok_type | description | token |
dict_name | tsvector
---------------+----------+-------------+---------+--------------------+-----------
default_czech | word | Word | Stěhule | {cz_ispell,simple} |
default_czech | lword | Latin word | a | {cz_ispell,simple} |
default_czech | word | Word | bydlím | {cz_ispell,simple} |
'bydlet'
Thank You
Pavel Stehule
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2003-12-22 10:50:17 | Re: Groff and Weinberg SQL Complete Reference - Sample database? |
Previous Message | Kris Jurka | 2003-12-22 09:57:42 | Re: BLOBS : how to remove them totally |