Re: Patch for collation using ICU

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch for collation using ICU
Date: 2005-05-07 14:14:41
Message-ID: 200505071414.j47EEfZ02040@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Palle Girgensohn wrote:
> >> This is because in the standard postgres implementation, upper/lower is
> >> done one character at the time. A proper upper/lower cannot do it that
> >> way. Other known example is in Turkish, where an ? (?) should look
> >> different whether it is an initial letter or not. This fails in
> >> standard postgresql for all platforms.
> >
> > Uh, where do you see that? Our code has:
> >
> > workspace = texttowcs(string);
> >
> > for (i = 0; workspace[i] != 0; i++)
> > workspace[i] = towupper(workspace[i]);
>
> as you see, the loop runs towupper for one character at the time. I cannot
> consider whether the letter is the initial, as required in Turkish, and it
> cannot really convert one character into two ('?' -> 'SS')

Oh, OK. I thought texttowcs() would expand the string to allow such
conversions.

> >> > We have depricated UNICODE in 8.1 in favor of UTF8 (no dash). Does
> >> > that help?
> >>
> >> I'm aware of that. It might help for unicode, but there are a bunch of
> >> other encodings. IANA has decided that utf-8 has *no* aliases, hence
> >> only utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is
> >> fogiving, I don't remember/know, but I think we need the mappings,
> >> unfortunately.
> >
> > OK. I guess I am just confused why the native implementations are OK.
>
> They're OK since they understand that UNICODE (or UTF8) is really utf-8.
> Problem is the strings used to describe them are not understood by ICU.
>
> BTW, the pg_enc2iananame_tbl is only used *from* internal representation
> *to* IANA, not the other way around. Maybe that fact lowers the rate of
> confusion? ;-)

OK, got it. I am still a little confused why every native
implementation understands our existing names but ICU does not.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Hansen 2005-05-07 14:16:37 Re: Patch for collation using ICU
Previous Message John Hansen 2005-05-07 14:11:21 Re: Patch for collation using ICU