Re: [tsvector] to_tsvector called multiple times

From: "Sven R(dot) Kunze" <srkunze(at)tbz-pariv(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [tsvector] to_tsvector called multiple times
Date: 2015-05-26 10:29:52
Message-ID: 55644B20.9040404@tbz-pariv.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks. It seems as if I have use snowball. So, I go ahead and post my
issue at github.

Maybe, I have difficulties to understand the relationship/dependencies
between all these 'maybe' available dictionary/parser/stemmer packages.

What happens if I install all packages for a single language? (hunspell,
myspell, ispell, snowball)

Are they complementary? Do they replace each other?

>>> \dFd
List of text search dictionaries
Schema | Name | Description
------------+-----------------+-----------------------------------------------------------
pg_catalog | danish_stem | snowball stemmer for danish language
pg_catalog | dutch_stem | snowball stemmer for dutch language
pg_catalog | english_stem | snowball stemmer for english language
pg_catalog | finnish_stem | snowball stemmer for finnish language
pg_catalog | french_stem | snowball stemmer for french language
pg_catalog | german_stem | snowball stemmer for german language
pg_catalog | hungarian_stem | snowball stemmer for hungarian language
pg_catalog | italian_stem | snowball stemmer for italian language
pg_catalog | norwegian_stem | snowball stemmer for norwegian language
pg_catalog | portuguese_stem | snowball stemmer for portuguese language
pg_catalog | romanian_stem | snowball stemmer for romanian language
pg_catalog | russian_stem | snowball stemmer for russian language
pg_catalog | simple | simple dictionary: just lower case and
check for stopword
pg_catalog | spanish_stem | snowball stemmer for spanish language
pg_catalog | swedish_stem | snowball stemmer for swedish language
pg_catalog | turkish_stem | snowball stemmer for turkish language
(16 rows)

On 26.05.2015 12:09, Albe Laurenz wrote:
> Sven R. Kunze wrote:
>> However, are you sure, I am using snowball? Maybe, I am reading the
>> documenation wrong:
> test=> SELECT * FROM ts_debug('german', 'system');
> alias | description | token | dictionaries | dictionary | lexemes
> -----------+-----------------+--------+---------------+-------------+---------
> asciiword | Word, all ASCII | system | {german_stem} | german_stem | {syst}
> (1 row)
>
> test=> \dFd german_stem
> List of text search dictionaries
> Schema | Name | Description
> ------------+-------------+--------------------------------------
> pg_catalog | german_stem | snowball stemmer for german language
> (1 row)
>
>> http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
>> but it seems as it depends on which packages (ispell, hunspell, myspell,
>> snowball + corresponding languages) my system has installed.
>>
>> Is there an easy way to determine which of these packages PostgreSQL
>> uses AND what for?
> If you use a standard PostgreSQL distribution, you will have no ispell
> dictionary (as the documentation you quote says).
> You can always list all dictionaries with "\dFd" in psql.
>
>> Sure. That might be the problem. It occurs to me that stems (if detected
>> as such) should be left alone.
>> In case a stem is real German word, it should be stemmed to itself anyway
>> If not, it might help not to stem in order to avoid errors.
> Yes, but that would mean that you have a way to determine from a string
> whether it is a word or a stem or both, and the software does not do that.
>
> Yours,
> Laurenz Albe
>

Regards,

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze(at)tbz-pariv(dot)de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sven R. Kunze 2015-05-26 10:35:43 Re: [tsvector] to_tsvector called multiple times
Previous Message Albe Laurenz 2015-05-26 10:09:15 Re: [tsvector] to_tsvector called multiple times