From: | Alexander Presber <aljoscha(at)weisshuhn(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: TSearch2 / German compound words / UTF-8 |
Date: | 2006-01-27 14:11:13 |
Message-ID: | 6AC64576-AEB6-47C0-AA8C-0242F9296BEA@weisshuhn.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>> Tsearch/isepll is not able to break this word into parts, because
>> of the "s" in "Produktion/s/intervall". Misspelling the word as
>> "Produktionintervall" fixes it:
> It should be affixes marked as 'affix in middle of compound word',
> Flag is '~', example look in norsk dictionary:
>
> flag ~\\:
> [^S] > S #~ advarsel > advarsels-
>
> BTW, we develop and debug compound word support on norsk
> (norwegian) dictionary, so look for example there. But we don't
> know Norwegian, norwegians helped us :)
Hello everyone!
I cannot get this to work. Neither in a german version, nor with the
norwegian example supplied on the tsearch website.
That means, just like Hannes I can get compound word support without
inserted 's' in german and norwegian:
"Vertragstrafe" works, but not "Vertragsstrafe", which is the right
Form.
So I tried it the other way around: My dictionary consists of two words:
---
vertrag/zs
strafe/z
---
My affixes file just switches on compounds and allows for s-insertion
as described in the norwegian tutorial:
---
compoundwords controlled z
suffixes
flag s:
[^S] > S # endet nicht auf "s": "s" anfuegen und in
compound-check ("Recht" > "Rechts-")
---
ts_debug yields:
tstest=# SELECT tsearch2.ts_debug('vertragstrafe strafevertrag
vertragsstrafe');
ts_debug
------------------------------------------------------------------------
-------------
(german,lword,"Latin
word",vertragstrafe,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",strafevertrag,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",vertragsstrafe,"{ispell_de,simple}",'vertragsstrafe')
(3 Zeilen)
I would say, the ispell compound support does not honor the s-Flag in
compounds.
Could it be, that this feature got lost in a regression? It must have
worked for norwegian once. (Take the "overtrekksgrilldresser" example
from the tsearch2:compounds tutorial, that I cannot reproduce).
Any hints?
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | John D. Burger | 2006-01-27 14:14:09 | Re: Finding missing records |
Previous Message | Richard Huxton | 2006-01-27 14:09:45 | Re: PG_RESTORE and database size |