From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: suitable text search configuration |
Date: | 2007-10-24 22:17:47 |
Message-ID: | 20071024221747.GA4626@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrew Dunstan wrote:
>
> Tom Lane wrote:
>> Actually, looking at the examples so far, I'm thinking we should just
>> consider the string up to the first _, period.
I studied the standards a bit to see if they mandated that the locale
names must be in the form "language_COUNTRY", and couldn't find
anything. Which makes me think it's mostly by (very well established)
convention. I think trying to parse the _ should not be done on a first
attempt.
>> An alternative is to try to match the full locale (es_ES) and then try
>> the language (es) if that wasn't found. That would leave room to put
>> country-by-country exceptions in, but for the moment we'd not have any.
>
> Can anyone point to a real world example where country by country would
> make sense? If we need to distinguish flavors of some languages, I would
> not be at all surprised if this was not by country anyway.
pt_BR versus pt_PT. I'm not sure if it makes a difference to a stemmer,
but maybe to a thesaurus it does ...
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2007-10-24 22:27:33 | Re: Feature Freeze date for 8.4 |
Previous Message | Tom Lane | 2007-10-24 22:14:04 | Re: suitable text search configuration |