Re: PostgreSQL, UTF-8 and Mac OS X

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL, UTF-8 and Mac OS X
Date: 2005-11-07 15:42:10
Message-ID: 20051107154204.GE841@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 07, 2005 at 09:47:21AM -0500, Tom Lane wrote:
> Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de> writes:
> > I have linked the LC_COLLATE for de_DE.UTF-8 to the same LC_COLLATE
> > file that works fine with ISO8859-1.
>
> Um ... why would you expect that to work at all? Aren't the collation
> files very dependent on the encoding?

You'd think so, but standard Mac OS X/FreeBSD just link the UTF-8
locales to the US-ASCII locales. So by default:

de_DE.UTF-8 links to ln_LN.US_ASCII

All he's done is change it so the UTF-8 locale uses latin9 rather than
ascii ordering. It obviously breaks for actual UTF-8 strings, but the C
library doesn't support that anyway... Multibyte collation simply
isn't supported so linking files at random won't crash anything.

All the more reason to go for something like ICU...
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2005-11-07 16:49:59 Re: Changing ids conflicting with serial values?
Previous Message Tom Lane 2005-11-07 15:10:10 Re: Aggregates, group, and order by