From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> |
Cc: | Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: UTF-8 encoding problem w/ libpq |
Date: | 2013-06-03 16:22:59 |
Message-ID: | 51ACC2E3.9020309@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03.06.2013 18:27, ktm(at)rice(dot)edu wrote:
> On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote:
>>
>>>> If I change the strCreate query and add double quotes around the column
>>> name, then the problem disappears. But the original name is already in
>>> lowercase, so I think it should also work without quoting the column name.
>>>> Am I missing some setup in either the database or in the use of libpq?
>>>>
>>>> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>>>>
>>>> The database uses:
>>>> ENCODING = 'UTF8'
>>>> LC_COLLATE = 'English_United Kingdom.1252'
>>>> LC_CTYPE = 'English_United Kingdom.1252'
>>>>
>>>> Thanks for any help,
>>>>
>>>> Martin
>>>>
>>>
>>> Hi Martin,
>>>
>>> If you do not want the lowercase behavior, you must put double-quotes
>>> around the column name per the documentation:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
>>> lexical.html#SQL-SYNTAX-IDENTIFIERS
>>>
>>> section 4.1.1.
>>>
>>> Regards,
>>> Ken
>>
>> The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO.
>
> Only in utf-8 which needs to be double-quoted for a column name as you have
> seen, otherwise the value will be lowercased per byte.
He *is* using UTF-8. Or trying to, anyway :-). The downcasing in the
backend is supposed to leave bytes with the high-bit set alone, ie. in
UTF-8 encoding, it's supposed to leave ä and ß alone.
I suspect that the conversion to UTF-8, before the string is sent to the
server, is not being done correctly. I'm not sure what's wrong there,
but I'd suggest printing the actual byte sequence sent to the server, to
check if it's in fact valid UTF-8. ie. replace the PQexec() line with
something like:
const char *s = ToUtf8(strCreate.c_str()).c_str();
int i;
for (i=0; s[i]; i++)
printf("%02x", (unsigned char) s[i]);
printf("\n");
pResult = PQexec(pConn, s);
That should contain the UTF-8 byte sequence for äß, "c3a4c39f"
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2013-06-03 17:06:15 | Re: Re: [HACKERS] high io BUT huge amount of free memory |
Previous Message | David E. Wheeler | 2013-06-03 16:16:03 | Re: Perl 5.18 breaks pl/perl regression tests? |