Re: BUG #10589: hungarian.stop file spelling error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "zsoros(at)gmail(dot)com" <zsoros(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10589: hungarian.stop file spelling error
Date: 2014-06-10 21:08:25
Message-ID: 5337.1402434505@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> [ we seem to have gotten a misencoded version of hungarian.stop ]

Actually, it looks like things are even worse than that: the Hungarian
stemmer code seems to be confused about this too. In the first place,
we've got a LATIN1 version of that stemmer, which I would imagine is
entirely useless; and in the second place, the UTF8 version has no
reference to any non-LATIN1 characters.

Again, I'm suspecting this problem goes further than Hungarian,
because the set of stem_ISO_8859_1_foo.c files in
src/backend/snowball/libstemmer/ covers a lot more languages than
I think LATIN1 is meant to cope with. I'm not sure how much of this
is broken in the original Snowball code and how much is our error
while importing the code.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-06-11 03:09:22 Re: BUG #10589: hungarian.stop file spelling error
Previous Message David G Johnston 2014-06-10 20:36:00 Re: BUG #10591: setting newly added columns to null is slow