From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | penty(dot)wenngren(at)dgc(dot)se, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #3730: Creating a swedish dictionary fails |
Date: | 2007-11-09 13:56:02 |
Message-ID: | 20071109135602.GE2768@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Tom Lane wrote:
> Penty Wenngren <penty(dot)wenngren(at)dgc(dot)se> writes:
> > I used iconv to convert svenska.aff and svenska.datalist (from
> > iswedish-1.2.1) to UTF-8. The converted files can be found at:
> > http://www.lederhosen.org/swedish.affix
> > http://www.lederhosen.org/swedish.dict
>
> I think the reason it's failing right there is that that line is the
> first affix rule containing a non-ASCII letter, and the rules are
> supposed to only contain letters and certain specific punctuation.
> I suspect you are working in a locale that doesn't think à is a
> letter --- check lc_ctype.
I patched parse_affentry to report the current token and I see this:
alvherre=# CREATE TEXT SEARCH DICTIONARY swedish_ispell (
TEMPLATE = ispell,
DictFile = swedish,
AffFile = swedish,
StopWords = swedish);
ERROR: syntax error at line 149 (str: "örs
") of affix file "/home/alvherre/Code/CVS/pgsql/install/00orig/share/tsearch_data/swedish.affix"
I am wondering if the newline being included in the token could be
causing a problem.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
From | Date | Subject | |
---|---|---|---|
Next Message | Pedro Gimeno | 2007-11-09 14:23:13 | Revisiting BUG #3684: After dump/restore, schema PUBLIC always exists |
Previous Message | Magnus Hagander | 2007-11-09 13:29:43 | Re: BUG #3730: Creating a swedish dictionary fails |