From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: tsearch in core patch |
Date: | 2007-06-22 14:46:44 |
Message-ID: | 200706221446.l5MEkiO24647@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > I very much doubt that the different spanishes are any different in the
> > stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
> > but in the case of portuguese I'm not so sure. Maybe there are other
> > examples (like chinese, but I'm not sure how useful is tsearch for
> > chinese).
>
> > And the .ISO8859-1 part you don't need at all if you accept that the
> > files are UTF8 by design, as Tom proposed.
>
> Also, the problem we're dealing with here is mainly lack of
> standardization of the encoding part of locale names. AFAIK, just about
> everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
> after that (if any) that is not too consistent across platforms.
> So I see no problem in distinguishing between pt_PT and pt_BR if it
> turns out we have to. The trick is to not look at any more of the
> locale name than that; and if we standardize on "stopword files are
> UTF8" then I don't think we need to.
OK, and the open question is when do we do this default setting. If we
do it in initdb then we can isolate all the detection there.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-06-22 14:49:42 | Re: month abreviation |
Previous Message | Bruce Momjian | 2007-06-22 14:43:00 | Re: Worries about delayed-commit semantics |