From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS |
Date: | 2011-06-09 14:11:38 |
Message-ID: | BANLkTinP9XBcPuEK=7XPqh=NcOVFBKYDUw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> But now that I re-think about it, I guess what I'm confused about is
>> this code here:
>
>> if (ch >= 'A' && ch <= 'Z')
>> ch += 'a' - 'A';
>> else if (IS_HIGHBIT_SET(ch) && isupper(ch))
>> ch = tolower(ch);
>> result[i] = (char) ch;
>
> The expected behavior there is that case-folding of non-ASCII characters
> will occur in single-byte encodings but nothing will happen to
> multi-byte characters. We are relying on isupper() to not return true
> when presented with a character fragment in a multibyte locale.
Based on Jeevan's original message, it seems like that's not always
the case, at least on Windows.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-06-09 14:15:07 | Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS |
Previous Message | Tom Lane | 2011-06-09 14:07:29 | Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS |