Re: Unicode string literals versus the world

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unicode string literals versus the world
Date: 2009-04-16 10:51:13
Message-ID: 20090416105113.GK12225@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 15, 2009 at 11:19:42PM +0300, Marko Kreen wrote:
> On 4/15/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Given Martijn's complaint about more-than-16-bit code points, I think
> > the \u proposal is not mature enough to go into 8.4. We can think
> > about some version of that later, if there's enough interest.
>
> I think it would be good idea. Basically we should pick one from
> couple of pre-existing sane schemes. Here is quick summary
> of Python, Perl and Java:
>
> Python [1]:
>
> \uXXXX - 16-bit codepoint
> \UXXXXXXXX - 32-bit codepoint
> \N{char-name} - Characted by name

Microsoft have also gone this way in C#, named code points are not
supported however.

> Perl [2]:
>
> \x{XXXX..} - {} contains hexadecimal codepoint
> \N{char-name} - Unicode char name

Looks OK, but the 'x' seems somewhat redundant. Why not just:

\{xxxx}

This would be following the BitC[2] project, especially if it was more
like:

\{U+xxxx}

e.g.

\{U+03BB}

would be the lowercase lambda character. Added appeal is in the fact
that this (i.e. U+03BB) is how the Unicode consortium spells code
points.

> Java [3]:
>
> \uXXXX - 16-bit codepoint

AFAIK, Java isn't the best reference to choose; it assumed from an early
point in its design that Unicode characters were at most 16bits and
hence had to switch its internal representation to UTF-16. I don't
program much Java these days to know how it's all worked out, but it
would be interesting to hear from people who regularly have to deal with
characters outside the BMP (i.e. code points greater than 65535).

--
Sam http://samason.me.uk/

[1] http://msdn.microsoft.com/en-us/library/aa664669(VS.71).aspx
[2] http://www.bitc-lang.org/docs/bitc/spec.html#stringlit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Schröder 2009-04-16 11:31:45 Re: Performance of full outer join in 8.3
Previous Message mito 2009-04-16 07:14:34 Postgres SQL specification (tests)