Re: Fulltext search configuration

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Mohamed <mohamed5432154321(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext search configuration
Date: 2009-02-02 14:50:00
Message-ID: Pine.LNX.4.64.0902021746080.4158@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Mohamed,

comment line in ar.affix
#FLAG long
and creation of ispell dictionary will work.
This is temp, solution.
Teodor is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

> Oleg, like I mentioned earlier. I have a different .affix file that I got
> from Andrew with the stop file and I get no errors creating the dictionary
> using that one but I get nothing out from ts_lexize.
> The size on that one is : 406,219 bytes
> And the size on the hunspell one (first) : 406,229 bytes
>
> Little to close, don't you think ?
>
> It might be that the arabic hunspell (ayaspell) affix file is damaged on
> some lines and I got the fixed one from Andrew.
>
> Just wanted to let you know.
>
> / Moe
>
>
>
> On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321(at)gmail(dot)com> wrote:
>
>> Ok, thank you Oleg.
>> I have another dictionary package which is a conversion to hunspell
>> aswell:
>>
>>
>> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
>> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08
>>
>> And running that gives me this error : (again the affix file)
>>
>> ERROR: wrong affix file format for flag
>> CONTEXT: line 560 of configuration file "C:/Program
>> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
>> Y 6
>> "
>>
>> / Moe
>>
>>
>>
>> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>>
>>> Mohamed,
>>>
>>> We are looking on the problem.
>>>
>>> Oleg
>>>
>>> On Mon, 2 Feb 2009, Mohamed wrote:
>>>
>>> No, I don't. But the ts_lexize don't return anything so I figured there
>>>> must
>>>> be an error somehow.
>>>> I think we are using the same dictionary + that I am using the stopwords
>>>> file and a different affix file, because using the hunspell (ayaspell)
>>>> .aff
>>>> gives me this error :
>>>>
>>>> ERROR: wrong affix file format for flag
>>>> CONTEXT: line 42 of configuration file "C:/Program
>>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40
>>>>
>>>> / Moe
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
>>>> daniel(dot)chiaramello(at)golog(dot)net> wrote:
>>>>
>>>> Hi Mohamed.
>>>>>
>>>>> I don't know where you get the dictionary - I unsuccessfully tried the
>>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic
>>>>> stopwords
>>>>> file.
>>>>>
>>>>> Renaming the file is supposed to be enough (I did it successfully for
>>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
>>>>> When I tried to create the dictionary:
>>>>>
>>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell (
>>>>> TEMPLATE = ispell,
>>>>> DictFile = ar_utf8,
>>>>> AffFile = ar_utf8,
>>>>> StopWords = english
>>>>> );
>>>>>
>>>>> I had an error:
>>>>>
>>>>> ERREUR: mauvais format de fichier affixe pour le drapeau
>>>>> CONTEXTE : ligne 42 du fichier de configuration ?
>>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40
>>>>>
>>>>> (which means Bad format of Affix file for flag, line 42 of configuration
>>>>> file)
>>>>>
>>>>> Do you have an error when creating your dictionary?
>>>>>
>>>>> Daniel
>>>>>
>>>>> Mohamed a ?crit :
>>>>>
>>>>>
>>>>> I have ran into some problems here.
>>>>> I am trying to implement arabic fulltext search on three columns.
>>>>>
>>>>> To create a dictionary I have a hunspell dictionary and and arabic stop
>>>>> file.
>>>>>
>>>>> CREATE TEXT SEARCH DICTIONARY hunspell_dic (
>>>>> TEMPLATE = ispell,
>>>>> DictFile = hunarabic,
>>>>> AffFile = hunarabic,
>>>>> StopWords = arabic
>>>>> );
>>>>>
>>>>>
>>>>> 1) The problem is that the hunspell contains a .dic and a .aff file but
>>>>> the configuration requeries a .dict and .affix file. I have tried to
>>>>> change
>>>>> the endings but with no success.
>>>>>
>>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing
>>>>>
>>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ?
>>>>>
>>>>> 4) I have read that when using dictionaries, if a word is not recognized
>>>>> by
>>>>> any dictionary it will not be indexed. I find that troublesome. I would
>>>>> like
>>>>> everything but the stop words to be indexed. I guess this might be a
>>>>> step
>>>>> that I am not ready for yet, but just wanted to put it out there.
>>>>>
>>>>>
>>>>>
>>>>> Also I would like to know how the process of the fulltext search
>>>>> implementation looks like, from config to search.
>>>>>
>>>>> Create dictionary, then a text configuration, add dic to configuration,
>>>>> index columns with gin or gist ...
>>>>>
>>>>> How does a search look like? Does it match against the gin/gist index.
>>>>> Have that index been built up using the dictionary/configuration, or is
>>>>> the
>>>>> dictionary only used on search frases?
>>>>>
>>>>> / Moe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> Regards,
>>> Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>
>>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mohamed 2009-02-02 15:01:52 Re: Fulltext search configuration
Previous Message Mohamed 2009-02-02 14:40:56 Re: Fulltext search configuration