From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Marko Kreen <markokr(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Latest on CITEXT 2.0 |
Date: | 2008-07-01 15:25:07 |
Message-ID: | 200807011525.m61FP7221773@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Marko Kreen wrote:
> On 7/1/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > "Marko Kreen" <markokr(at)gmail(dot)com> writes:
> > > On 6/26/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > >> BTW, I don't think you can use that same-length optimization for
> > >> citext. There's no reason to think that upper/lowercase pairs will
> > >> have the same length all the time in multibyte encodings.
> >
> > > What about this code in current str_tolower():
> >
> > > /* Output workspace cannot have more codes than input bytes */
> > > workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
> >
> >
> > That's working with wchars, not bytes.
>
> Ah, I missed the point of char2wchar() line.
>
> I'm rather unfamiliar with various MB API-s, sorry.
>
> There's another thing I'm probably missing: does current code handle
> multi-wchar codepoints? Or is it guaranteed they don't happen?
> (Wasn't wchar_t usually 16bit value?)
If you want a simple example of wide character use look at
oracle_compat.c::upper() which calls str_toupper() in CVS HEAD.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Kreen | 2008-07-01 15:33:01 | Re: Latest on CITEXT 2.0 |
Previous Message | Richard Huxton | 2008-07-01 15:22:05 | Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump) |