From: | Tommy Gildseth <tommy(dot)gildseth(at)usit(dot)uio(dot)no> |
---|---|
To: | |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Text search with ispell |
Date: | 2009-01-27 15:37:19 |
Message-ID: | 497F2A2F.2010207@usit.uio.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Oleg Bartunov wrote:
> On Tue, 27 Jan 2009, Tommy Gildseth wrote:
>
>> Tommy Gildseth wrote:
>>> Oleg Bartunov wrote:
>>>> Have you read
>>>> http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
>>>> We suggest to use dictionaries which come with openoffice, hunspell,
>>>> probably
>>>> has better support of composite words.
>>>>
>>>
>>> Thanks, that knocked me onto the right track. To easy to miss the
>>> blindingly obvious at times. :-)
>>> Works beautifully now.
>>>
>>
>> I may have been to quick to declare success.
>>
>> The following works as expected, returning the individual words:
>> SELECT
>> ts_debug('norwegian', 'overbuljongterningpakkmesterassistent'),
>> ts_debug('norwegian', 'sjokoladefabrikk'),
>> ts_debug('norwegian', 'epleskrott');
>> -[ RECORD 1
>> ]--------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> ts_debug | (asciiword,"Word, all
>> ASCII",overbuljongterningpakkmesterassistent,"{no_ispell,norwegian_stem}",no_ispell,"{buljong,terning,pakk,mester,assistent}")
>>
>> ts_debug | (asciiword,"Word, all
>> ASCII",sjokoladefabrikk,"{no_ispell,norwegian_stem}",no_ispell,"{sjokoladefabrikk,sjokolade,fabrikk}")
>>
>> ts_debug | (asciiword,"Word, all
>> ASCII",epleskrott,"{no_ispell,norwegian_stem}",no_ispell,"{epleskrott,eple,skrott}")
>>
>>
>>
>> But, the following does not:
>> SELECT
>> ts_debug('norwegian', 'hemsedalsdans'),
>> ts_debug('norwegian', 'l?rdalsbrua'),
>> ts_debug('norwegian', 'hengesmykke');
>> -[ RECORD 1
>> ]----------------------------------------------------------------------------------------------------
>>
>> ts_debug | (asciiword,"Word, all
>> ASCII",hemsedalsdans,"{no_ispell,norwegian_stem}",norwegian_stem,{hemsedalsdan})
>>
>> ts_debug | (word,"Word, all
>> letters",l?rdalsbrua,"{no_ispell,norwegian_stem}",norwegian_stem,{l?rdalsbru})
>>
>> ts_debug | (asciiword,"Word, all
>> ASCII",hengesmykke,"{no_ispell,norwegian_stem}",norwegian_stem,{hengesmykk})
>>
>>
>>
>> Would this be due to a limitation in the dictionary, or a
>> misconfiguration on my side?
>
> sorry, I don't know norwegian, what do you mean ? Did you complain that
> no_ispell doesn't recognize these words ?
Yes, I'm sorry, I should have explained better.
The words hemsedalsdans, hengesmykke and lærdalsbrua, are
"concatenations" of the words Hemsedal and dans, henge and smykke and
Lærdal and bru. Hemsedal and Lærdal are in fact geographic names, so I'm
not sure it would handle that at all anyway. Both parts of the word,
hengesmykke, is in the dictionary though, ie. both henge and smykke. It
seems that some words it is able to properly spilt, and then some it
doesn't recognise.
The problem I'm trying to work around, is that as far as I can tell,
tsearch doesn't support truncation, ie. searching for "*smykke" or
"hemsedal*" etc.
--
Tommy Gildseth
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2009-01-27 16:18:09 | Re: Varchar vs text |
Previous Message | Tom Lane | 2009-01-27 15:09:08 | Re: Varchar vs text |