From: | Stefan Keller <sfkeller(at)gmail(dot)com> |
---|---|
To: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | pgsql-general List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: FTS with more than one language in body and with unknown query language? |
Date: | 2016-07-14 22:54:38 |
Message-ID: | CAFcOn29yEES4Y=E1c0Nj__8o1Kb_RB4Ey2NZtTYuy7w2DRcjag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
приве́т! Artur
Thanks for your explanations.
2016-07-14 17:20 GMT+02:00 Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>:
> On 14.07.2016 01:16, Stefan Keller wrote:
...
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.
It's a domain-specific corpus of max. 1000 records of descriptive text
(metadata) about geographic data, like topographic map, land use
planning, etc.
...
>> * How to setup a text search configuration which e.g. stems en and de
>> words?
I still would like to give FTS a try with synonym dictionary (en-de).
Now, I'm wondering how to setup the configuration. I've seen examples
to process either english, german or russian alone. But I did not find
yet any documentation on how to setup the text search configuration
where a corpus contains two (or more) languages at same time in a
table (body_en and body_de).
:Stefan
2016-07-14 17:20 GMT+02:00 Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>:
> Hi,
>
> On 14.07.2016 01:16, Stefan Keller wrote:
>>
>> Hi,
>>
>> I have a text corpus which contains either German or English docs and
>> I expect queries where I don't know if it's German or English. So I'd
>> like e.g. that a query "forest" matches "forest" in body_en but also
>> "Wald" in body_de.
>>
>> I created a table with attributes body_en and body_de (type "text"). I
>> will use ts_vector/ts_query on the fly (don't need yet an index
>> (attributes)).
>>
>> * Can FTS handle this multilingual situation?
>
>
> In my opinion, PostgreSQL cant handle it. It cant translate words from one
> language to another, it just stems word from original form to basic form.
> First you need to translate word from English to German, then search word in
> the body_de attribute.
>
> And the issue is complicated by the fact that one word could have different
> meaning in the other language.
>
>> * How to setup a text search configuration which e.g. stems en and de
>> words?
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.
>
>
>> * Any hints to related work where FTS has been used in a multilingual
>> context?
>>
>> :Stefan
>>
>>
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Derek Mahar | 2016-07-14 23:06:05 | Re: PostgreSQL image for rkt on CoreOS |
Previous Message | Charles Weitzer | 2016-07-14 18:42:33 | Re: Database Architect - Voleon Capital Management LP |