From: | "Sven R(dot) Kunze" <srkunze(at)tbz-pariv(dot)de> |
---|---|
To: | obartunov(at)gmail(dot)com |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: [tsvector] to_tsvector called multiple times |
Date: | 2015-05-26 09:29:53 |
Message-ID: | 55643D11.1040604@tbz-pariv.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks, Oleg. Unfortunately, that does not work quite well as German is
comprised of many compound nouns.
In fact, I discovered that anomaly by searching through a
domain-specific word table. For example: Waferhandlingsystem. There are
many '*system' but the PostgreSQL does not allow me to have a suffix;
only a prefix and only for to_tsquery
(http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-SYNONYM-DICTIONARY)
Is there another possibility?
On 26.05.2015 11:05, Oleg Bartunov wrote:
> You can ask http://snowball.tartarus.org/ for stemmer. Meanwhile,
> you can have small personal dictionary (before stemmer) with such
> exceptions, for example, use synonym template
>
> system system
>
> Oleg
>
>
> On Tue, May 26, 2015 at 11:18 AM, Sven R. Kunze <srkunze(at)tbz-pariv(dot)de
> <mailto:srkunze(at)tbz-pariv(dot)de>> wrote:
>
> Hi everybody,
>
> the following stemming results made me curious:
>
> select to_tsvector('german', 'systeme'); > 'system':1
> select to_tsvector('german', 'systemes'); > 'system':1
> select to_tsvector('german', 'systems'); > 'system':1
> select to_tsvector('german', 'systemen'); > 'system':1
> select to_tsvector('german', 'system'); > 'syst':1
>
>
> First of all, this seems to be a bug in the German stemmer. Where
> can I fix it?
>
> Second, and more importantly, as I understand it, the stemmed
> version of a word should be considered normalized. That is, all
> other versions of that stem should be mapped to it as well. The
> interesting problem here is that PostgreSQL maps the stem itself
> ('system') to a completely different stem ('syst').
>
> Should a stem not remain stable even when to_tsvector is called on
> it multiple times?
>
> --
> Sven R. Kunze
> TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
> Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
> e-mail: srkunze(at)tbz-pariv(dot)de <mailto:srkunze(at)tbz-pariv(dot)de>
> web: www.tbz-pariv.de <http://www.tbz-pariv.de>
>
> Geschäftsführer: Dr. Reiner Wohlgemuth
> Sitz der Gesellschaft: Chemnitz
> Registergericht: Chemnitz HRB 8543
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org
> <mailto:pgsql-general(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze(at)tbz-pariv(dot)de
web: www.tbz-pariv.de
Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543
From | Date | Subject | |
---|---|---|---|
Next Message | Sven R. Kunze | 2015-05-26 09:47:43 | Re: [tsvector] to_tsvector called multiple times |
Previous Message | Oleg Bartunov | 2015-05-26 09:05:50 | Re: [tsvector] to_tsvector called multiple times |