Re: Mixing different LC_COLLATE and database encodings

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: kleptog(at)svana(dot)org
Cc: moseley(at)hank(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Mixing different LC_COLLATE and database encodings
Date: 2006-02-21 01:27:15
Message-ID: 20060221.102715.28783242.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Sat, Feb 18, 2006 at 08:16:07PM -0800, Bill Moseley wrote:
> > Is the Holy Grail encoding and lc_collate settings per column?
>
> Well yes. I've been trying to create a system where you can handle
> multiple collations in the same database. I posted the details to
> -hackers and got part of the way, but it's a lot of work.
>
> As for encodings, to be honest, I'm not sure whether it's a great idea
> to support multiple encodings simultaneously. Things become a lot
> easier if you know everything is the same encoding. If you set the
> client_encoding automatically on startup it has pretty much the same
> effect as having the server always use that encoding. It's just a bit
> of time wasted in conversion, but the client doesn't need to care.
>
> By way of example, see ICU which is an internationalisation library
> we're considering to get consistant locale support over all platforms.
> It supports one encoding, namely UTF-16. It has various functions to
> convert other encodings to or from that, but internally it's all
> UTF-16. So if we do use that, then all encodings (except native UTF-16)
> will need to conversion all the time, so you don't buy anything by
> having the server in some random encoding.
>
> The problem ofcourse being that the SQL standard requires some encoding
> support. No-one has really come up with a proposal for that yet. IMHO,
> that's a parser issue more than anything else.

If you consider to allow only UTF-16 or whatever encoding in backend,
I will strongly against the idea. We Japanese need those encodings
native support. Converting those encodings with Unicode everytime when
backend and forntend have conversations will be serious performance
hit. Moreover the converion is known as not being roundtrip safe, that
means some information will be lost during the conversion. The another
point would be on disk format. UTF-16 will require more storage than
local encodings. Probably UTF-8 will require more.

I have a feeling that ICU is good for applications, but is not for
DBMSs.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2006-02-21 03:24:09 Re: How to specify infinity for intervals ?
Previous Message Stephen Frost 2006-02-21 00:06:52 Re: Question about COPY to/from