Re: PostgreSQL, UTF-8 and Mac OS X

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL, UTF-8 and Mac OS X
Date: 2005-11-07 14:40:22
Message-ID: 20051107144022.GD841@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 07, 2005 at 02:28:05PM +0100, Guido Neitzer wrote:
> I think I was the one who asked.
>
> I worked on my locale problem on the weekend and was able to build a
> LC_COLLATE file, that actually works with ISO locales, but not with
> UTF-8 (50% progress ... ;-)).

Guess the problem is that you have to import the entire Unicode
database to make it work. I think the code is multibyte aware though,
it's just that no-one has done the work.

Disclaimer: I'm working with Linux/Glibc which has had proper collation
for quite a while now so I have no real understanding of systems that
don't have it.

> When you test the UNIX utility "sort" on Mac OS X, you should be
> aware, that the pre-installed version on Mac OS X ignores locales at
> all ... :-( I had to install the gnu coreutils to get a sort that
> works with locales, and this also fails on UTF-8 but works with ISO
> encoding/collate - same as PG does.

Nasty.

> Now I'm not sure, whether my own LC_COLLATE file is not appropriate
> for UTF-8 (why not?) or whether Mac OS X locale does not support
> UTF-8 at all as you state.

Hmm, I just went back to the source code (adv_cmds-79.1) and it looks
like collations don't support UTF-8 at all. Or any multibyte encoding.

> Will be cool to have locale support directly in PostgreSQL.

Yeah, but seems a bit lame for an operating system to claim to support
multibyte locales if it can't do collation on them. :( It supports
everything but collation, so it's obviously not a priority.

> So, just a quick question regarding a switch: is there a problem with
> using ISO8859-15 for now, and do a switch later with dumping the data
> and import it to a newer version which should then use UTF-8? Do I
> need to do some conversion or how does this work?

If you import as ISO8859-15 now, when you do the upgrade, simply set
the client encoding to that and PostgreSQL will convert it all to UTF-8
during the load.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-11-07 14:47:21 Re: PostgreSQL, UTF-8 and Mac OS X
Previous Message Michael Glaesemann 2005-11-07 14:38:54 Re: Aggregates, group, and order by