From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Alexander Presber <aljoscha(at)weisshuhn(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: TSearch2 / German compound words / UTF-8 |
Date: | 2006-01-27 17:00:44 |
Message-ID: | Pine.GSO.4.63.0601271959350.27734@ra.sai.msu.su |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alexander,
could you try tsearch2 from CVS HEAD ?
tsearch2 in 8.1.X doesn't supports UTF-8 and works for someone
only by accident :)
Oleg
On Fri, 27 Jan 2006, Alexander Presber wrote:
>>> Tsearch/isepll is not able to break this word into parts, because of the
>>> "s" in "Produktion/s/intervall". Misspelling the word as
>>> "Produktionintervall" fixes it:
>> It should be affixes marked as 'affix in middle of compound word',
>> Flag is '~', example look in norsk dictionary:
>>
>> flag ~\\:
>> [^S] > S #~ advarsel > advarsels-
>>
>> BTW, we develop and debug compound word support on norsk (norwegian)
>> dictionary, so look for example there. But we don't know Norwegian,
>> norwegians helped us :)
>
> Hello everyone!
>
> I cannot get this to work. Neither in a german version, nor with the
> norwegian example supplied on the tsearch website.
> That means, just like Hannes I can get compound word support without inserted
> 's' in german and norwegian:
> "Vertragstrafe" works, but not "Vertragsstrafe", which is the right Form.
>
> So I tried it the other way around: My dictionary consists of two words:
>
> ---
> vertrag/zs
> strafe/z
> ---
>
> My affixes file just switches on compounds and allows for s-insertion as
> described in the norwegian tutorial:
>
> ---
> compoundwords controlled z
> suffixes
> flag s:
> [^S] > S # endet nicht auf "s": "s" anfuegen und in
> compound-check ("Recht" > "Rechts-")
> ---
>
> ts_debug yields:
>
> tstest=# SELECT tsearch2.ts_debug('vertragstrafe strafevertrag
> vertragsstrafe');
> ts_debug
> -------------------------------------------------------------------------------------
> (german,lword,"Latin word",vertragstrafe,"{ispell_de,simple}","'strafe'
> 'vertrag'")
> (german,lword,"Latin word",strafevertrag,"{ispell_de,simple}","'strafe'
> 'vertrag'")
> (german,lword,"Latin
> word",vertragsstrafe,"{ispell_de,simple}",'vertragsstrafe')
> (3 Zeilen)
>
> I would say, the ispell compound support does not honor the s-Flag in
> compounds.
> Could it be, that this feature got lost in a regression? It must have worked
> for norwegian once. (Take the "overtrekksgrilldresser" example from the
> tsearch2:compounds tutorial, that I cannot reproduce).
>
> Any hints?
>
> Alexander
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Doug McNaught | 2006-01-27 17:07:07 | Re: Accessing an old database from a new OS installation. |
Previous Message | Tom Lane | 2006-01-27 16:47:32 | Re: stats for failed transactions (was Re: [GENERAL] VACUUM Question) |