Re: TSearch2: Problems with compound words and stop words

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Timo Haberkern <thaberkern(at)emedia-office(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: TSearch2: Problems with compound words and stop words
Date: 2004-11-05 11:19:27
Message-ID: Pine.GSO.4.61.0411051415250.29410@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 5 Nov 2004, Timo Haberkern wrote:

> Oleg,
>
> i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch
> yesterday. The configuration changed a little bit but the result is the same.
> I get no compound words. I'm using the locale de_DE with encoding ISO8859-1
> for the database.
>
> I think i spell is working correctly except the compound words. If i try
>
> SELECT lexize('de_ispell', 'springt')
>
> i get
>
> lexize
> {springen,springen}
>
> which seems correct.
>
>
> But a SELECT lexize('de_ispell', 'Autobahn')
>
> results in
>
> lexize
> {autobahn}
>
> i would expect {auto,bahn, autobahn}

Hmm, have you checked 'Autobahn' in ispell dictionary ? Does dictionary
you used supports 'Z' flag for compound words ?

>
> The new configuration after the compound word patch:
>

Seems you overestimate my capabilities :)

>
> Actions dict_name
> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1>
> dict_init
> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1>
> dict_initoption
> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1>
> dict_lexize
> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1>
> dict_comment
> <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1>
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> simple dex_init(text) /NULL/ dex_lexize(internal,internal,integer) Simple
> example of dictionary.
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> en_stem snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop
> snb_lexize(internal,internal,integer) English Stemmer. Snowball.
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> ru_stem snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop
> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball.
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> ispell_template spell_init(text) /NULL/
> spell_lexize(internal,internal,integer) ISpell interface. Must have
> .dict and .aff files
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> synonym syn_init(text) /NULL/ syn_lexize(internal,internal,integer)
> Example of synonym dictionary
> Edit
> <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> Delete
> <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=>
> de_ispell spell_init(text)
> DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict",
> AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff",
> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop"
> spell_lexize(internal,internal,integer) /NULL/
>
>
>
> Timo
>
>
> Oleg Bartunov wrote:
>
>> Timo,
>>
>> please, check you apply patch for compound word support.
>> What is version of postgresql ?
>> Does ispell dict works for non-compound words ?
>>
>> Oleg
>>
>> On Fri, 5 Nov 2004, Timo Haberkern wrote:
>>
>>> Hi there,
>>>
>>> i have some troubles with my TSearch2 Installation. I have done this
>>> installation as described in
>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words
>>> <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words>
>>>
>>> I used the german myspell dictionary from
>>> http://lingucomponent.openoffice.org/spell_dic.html and converted it with
>>> my2ispell
>>>
>>> Nearly everything is working fine so far, except two problems:
>>>
>>> 1.) The stopword-file seems to be ignored: If i try it with SELECT
>>> to_tsvector("default_german", "ein Haus") i get "ein":1 "haus":2
>>>
>>> ein should be a Stopword for german (and is defined the german.stop file
>>> as
>>> well)
>>>
>>> 2.) The compound words feature doesn"t work too. I have tried a lot of
>>> words,
>>> i.e. "Fehlermeldung" with SELECT to_tsvector("default_german",
>>> "Fehlermeldung")
>>> i only get
>>> "fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated
>>> entries. Is there anything wrong with the dictonary or my configuration?
>>>
>>>
>>> My current configuration:
>>>
>>> pg_ts_cfg:
>>>
>>> default default C
>>> default_russian default ru_RU.KOI8-R
>>> simple default NULL
>>> default_german default de_DE.ISO8859-1
>>> pg_ts_cfgmap:
>>>
>>> default_german host {simple}
>>> default_german hword {simple}
>>> default_german int {simple}
>>> default_german nlhword {simple}
>>> default_german nlpart_hword {simple}
>>> default_german nlword {simple}
>>> default_german part_hword {simple}
>>> default_german sfloat {simple}
>>> default_german uint {simple}
>>> default_german uri {simple}
>>> default_german url {simple}
>>> default_german version {simple}
>>> default_german word {simple}
>>> default_german lpart_hword {de_ispell,german_snowball}
>>> default_german lword {de_ispell,german_snowball}
>>> default_german lhword {de_ispell,german_snowball}
>>>
>>>
>>> pg_ts_dict:
>>>
>>> de_ispell | 17166 |
>>> DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict",
>>> AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff",
>>> StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" | 17167
>>> | NULL
>>> german_snowball | 17357 | NULL | 17162 | Snowball stemmer for german
>>>
>>>
>>>
>>> Can anyone help me?
>>>
>>> regards
>>>
>>> Timo
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Don't 'kill -9' the postmaster
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Csaba Nagy 2004-11-05 11:51:11 Re: Conactenating text with null values
Previous Message Michael Kleiser 2004-11-05 10:52:53 Re: Conactenating text with null values