Re: Patch for collation using ICU

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Palle Girgensohn" <girgen(at)pingpong(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for collation using ICU
Date: 2005-05-07 14:16:37
Message-ID: 5066E5A966339E42AA04BA10BA706AE50A9307@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
> Palle Girgensohn wrote:
> >
> > --On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen
> > <john(at)geeknet(dot)com(dot)au>
> > wrote:
> >
> > > Btw, I had been planning to propose replacing every single one of
> > > the built in charset conversion functions with calls to ICU (thus
> > > making pg _depend_ on ICU), as this would seem like a cleaner
> > > solution than for us to maintain our own conversion tables.
> > >
> > > ICU also has a fair few conversions that we do not have
> at present.
>
> That is a much larger issue, similar to our shipping our own
> timezone database. What does it buy us?
>
> o Do we ship it in our tarball?
> o Is the license compatible?
> o Does it remove utils/mb conversions?
> o Does it allow us to index LIKE (next high char)?
> o Does it allow us to support multiple encodings in
> a single database easier?
> o performance?
>
> > I just had a similar though. And why use ICU only for
> multibyte charsets?
> > If I use LATIN1, I still expect upper('?') => SS, and I
> don't get it...
> > Same for the Turkish example.
>
> We assume the native toupper() can handle single-byte
> character encodings. We use towupper() only for wide character sets.

That assumption is wrong,...

Encoding latin1
Locale <> de*

Select Upper('ß'); (lowercase german SS)
Should return SS, but returns ß

... John

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2005-05-07 14:21:47 Re: Patch for collation using ICU
Previous Message Bruce Momjian 2005-05-07 14:14:41 Re: Patch for collation using ICU