From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Palle Girgensohn <girgen(at)pingpong(dot)net> |
Cc: | John Hansen <john(at)geeknet(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Patch for collation using ICU |
Date: | 2005-05-07 14:06:43 |
Message-ID: | 200505071406.j47E6h600785@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Palle Girgensohn wrote:
>
> --On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen <john(at)geeknet(dot)com(dot)au>
> wrote:
>
> > Btw, I had been planning to propose replacing every single one of the
> > built in charset conversion functions with calls to ICU (thus making pg
> > _depend_ on ICU), as this would seem like a cleaner solution than for us
> > to maintain our own conversion tables.
> >
> > ICU also has a fair few conversions that we do not have at present.
That is a much larger issue, similar to our shipping our own timezone
database. What does it buy us?
o Do we ship it in our tarball?
o Is the license compatible?
o Does it remove utils/mb conversions?
o Does it allow us to index LIKE (next high char)?
o Does it allow us to support multiple encodings in
a single database easier?
o performance?
> I just had a similar though. And why use ICU only for multibyte charsets?
> If I use LATIN1, I still expect upper('?') => SS, and I don't get it...
> Same for the Turkish example.
We assume the native toupper() can handle single-byte character
encodings. We use towupper() only for wide character sets.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2005-05-07 14:07:14 | Re: Patch for collation using ICU |
Previous Message | Bruce Momjian | 2005-05-07 13:52:59 | Re: Patch for collation using ICU |