Re: Using a german affix file for compound words

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Wolfgang Winkler <wolfgang(dot)winkler(at)digital-concepts(dot)com>, obartunov(at)gmail(dot)com
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using a german affix file for compound words
Date: 2016-01-29 09:21:36
Message-ID: 56AB2F20.6030604@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 28.01.2016 20:36, Wolfgang Winkler wrote:
> I'm using 9.4.5 as well and I used exactly the same iconv lines as you
> postes below.
>
> Are there any encoding options that have to be set right? The database
> encoding is set to UTF8.
>
> ww

What output does the command show:

-> SHOW LC_CTYPE;

?

Did you try a dictionary from
http://extensions.openoffice.org/en/project/german-de-de-frami-dictionaries
?
You need extract from a downloaded archive de_DE_frami.aff and
de_DE_frami.dic files, rename them and convert them to UTF-8.

>
> Am 2016-01-28 um 17:34 schrieb Artur Zakirov:
>> On 28.01.2016 18:57, Oleg Bartunov wrote:
>>>
>>>
>>> On Thu, Jan 28, 2016 at 6:04 PM, Wolfgang Winkler
>>> <wolfgang(dot)winkler(at)digital-concepts(dot)com
>>> <mailto:wolfgang(dot)winkler(at)digital-concepts(dot)com>> wrote:
>>>
>>> Hi!
>>>
>>> We have a problem with importing a compound dictionary file for
>>> german.
>>>
>>> I downloaded the files here:
>>>
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>>>
>>> and converted them to utf-8 with iconv. The affix file seems ok when
>>> opened with an editor.
>>>
>>> When I try to create or alter a dictionary to use this affix file, I
>>> get the following error:
>>>
>>> alter TEXT SEARCH DICTIONARY german_ispell (
>>> DictFile = german,
>>> AffFile = german,
>>> StopWords = german
>>> );
>>> ERROR: syntax error
>>> CONTEXT: line 224 of configuration file
>>> "/usr/local/pgsql/share/tsearch_data/german.affix": " ABE >
>>> -ABE,äBIN
>>> "
>>>
>>> This is the first occurrence of an umlaut character in the file.
>>> I've found a view postings where the same file is used, e.g.:
>>>
>>> http://www.postgresql.org/message-id/flat/556C1411(dot)4010608(at)tbz-pariv(dot)de#556C1411(dot)4010608@tbz-pariv.de
>>>
>>> This users has been able to import the file. Am I missing something
>>> obvious?
>>>
>>
>> What version of PostgreSQL do you use?
>>
>> I tested this dictionary on PostgreSQL 9.4.5. Downloaded from the link
>> files and executed commands:
>>
>> iconv -f ISO-8859-1 -t UTF-8 german.aff -o german2.affix
>> iconv -f ISO-8859-1 -t UTF-8 german.dict -o german2.dict
>>
>> I renamed them to german.affix and german.dict and moved to the
>> tsearch_data directory. Executed commands without errors:
>>
>> -> create text search dictionary german_ispell (
>> Template = ispell,
>> DictFile = german,
>> AffFile = german,
>> Stopwords = german
>> );
>> DROP TEXT SEARCH DICTIONARY
>>
>> -> select ts_lexize('german_ispell', 'test');
>> ts_lexize
>> -----------
>> {test}
>> (1 row)
>>
>
>
> --
>
> *Wolfgang Winkler*
> Geschäftsführung
> wolfgang(dot)winkler(at)digital-concepts(dot)com
> mobil +43.699.19971172
>
> dc:*büro*
> digital concepts Novak Winkler OG
> Software & Design
> Landstraße 68, 5. Stock, 4020 Linz
> www.digital-concepts.com <http://www.digital-concepts.com>
> tel +43.732.997117.72
> tel +43.699.1997117.2
>
> Firmenbuchnummer: 192003h
> Firmenbuchgericht: Landesgericht Linz
>
>
>

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wolfgang Winkler 2016-01-29 09:51:12 Re: Using a german affix file for compound words
Previous Message Sachin Srivastava 2016-01-29 07:55:13 Re: Postgres 9.4.5 Installation on Centos 7.3