Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-18 22:46:37
Message-ID: 1191A5384BD641C68D288AF210BEFDA8@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Another point to keep in mind is that UTF16 is not really any easier
> to deal with than UTF8, unless you write code that fails to support
> characters outside the basic multilingual plane. Which is a restriction
> I don't believe we'd accept. But without that restriction, you're still
> forced to deal with variable-width characters; and there's nothing very
> nice about the way that's done in UTF16. So on the whole I think it
> makes more sense to use UTF8 for this.

I feel so. I guess why Windows, Java, and Oracle chose UTF-16 is ... it was
UCS-2 only with BMP when they chose it. So character handling was easier
and faster thanks to fixed-width encoding.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-09-18 22:55:00 Re: record identical operator
Previous Message MauMau 2013-09-18 22:42:29 Re: UTF8 national character data type support WIP patch and list of open issues.