Re: FTS with more than one language in body and with unknown query language?

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: FTS with more than one language in body and with unknown query language?
Date: 2016-07-14 22:54:38
Message-ID: CAFcOn29yEES4Y=E1c0Nj__8o1Kb_RB4Ey2NZtTYuy7w2DRcjag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

приве́т! Artur

Thanks for your explanations.

2016-07-14 17:20 GMT+02:00 Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>:
> On 14.07.2016 01:16, Stefan Keller wrote:
...
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.

It's a domain-specific corpus of max. 1000 records of descriptive text
(metadata) about geographic data, like topographic map, land use
planning, etc.

...
>> * How to setup a text search configuration which e.g. stems en and de
>> words?

I still would like to give FTS a try with synonym dictionary (en-de).
Now, I'm wondering how to setup the configuration. I've seen examples
to process either english, german or russian alone. But I did not find
yet any documentation on how to setup the text search configuration
where a corpus contains two (or more) languages at same time in a
table (body_en and body_de).

:Stefan

2016-07-14 17:20 GMT+02:00 Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>:
> Hi,
>
> On 14.07.2016 01:16, Stefan Keller wrote:
>>
>> Hi,
>>
>> I have a text corpus which contains either German or English docs and
>> I expect queries where I don't know if it's German or English. So I'd
>> like e.g. that a query "forest" matches "forest" in body_en but also
>> "Wald" in body_de.
>>
>> I created a table with attributes body_en and body_de (type "text"). I
>> will use ts_vector/ts_query on the fly (don't need yet an index
>> (attributes)).
>>
>> * Can FTS handle this multilingual situation?
>
>
> In my opinion, PostgreSQL cant handle it. It cant translate words from one
> language to another, it just stems word from original form to basic form.
> First you need to translate word from English to German, then search word in
> the body_de attribute.
>
> And the issue is complicated by the fact that one word could have different
> meaning in the other language.
>
>> * How to setup a text search configuration which e.g. stems en and de
>> words?
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.
>
>
>> * Any hints to related work where FTS has been used in a multilingual
>> context?
>>
>> :Stefan
>>
>>
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Derek Mahar 2016-07-14 23:06:05 Re: PostgreSQL image for rkt on CoreOS
Previous Message Charles Weitzer 2016-07-14 18:42:33 Re: Database Architect - Voleon Capital Management LP