UTF-8 encoding problem w/ libpq

From: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: UTF-8 encoding problem w/ libpq
Date: 2013-06-03 14:40:14
Message-ID: 11A8567A97B15648846060F5CD818EB8CAC2253F5E@DEV001EX.Dev.cadcorp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.

Here is my code:

const wchar_t *strName = L"id_äß";
wstring strCreate = wstring(L"create table test_umlaut(") + strName + L" integer primary key)";

PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres", "******");
if (!pConn) FAIL;
if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;

PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
if (pResult) PQclear(pResult);

pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
if (pResult) PQclear(pResult);

pResult = PQexec(pConn, "select * from test_umlaut");
if (!pResult) FAIL;
if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
if (PQnfields(pResult)!=1) FAIL;
const char *fName = PQfname(pResult,0);

ShowW("Name: ", strName);
ShowA("in UTF8: ", ToUtf8(strName).c_str());
ShowA("from DB: ", fName);
ShowW("in UTF16: ", ToWide(fName).c_str());

PQclear(pResult);
PQreset(pConn);

(ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)

And this is the output generated:

Name: id_äß
in UTF8: id_äß
from DB: id_ã¤ãÿ
in UTF16: id_???

It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name.
Am I missing some setup in either the database or in the use of libpq?

I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

The database uses:
ENCODING = 'UTF8'
LC_COLLATE = 'English_United Kingdom.1252'
LC_CTYPE = 'English_United Kingdom.1252'

Thanks for any help,

Martin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ktm@rice.edu 2013-06-03 14:47:59 Re: UTF-8 encoding problem w/ libpq
Previous Message Tom Lane 2013-06-03 14:31:23 Re: Perl 5.18 breaks pl/perl regression tests?