Re: UTF-8 encoding problem w/ libpq

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF-8 encoding problem w/ libpq
Date: 2013-06-03 14:47:59
Message-ID: 20130603144759.GG2892@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote:
> I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.
>
> Here is my code:
>
> const wchar_t *strName = L"id_äß";
> wstring strCreate = wstring(L"create table test_umlaut(") + strName + L" integer primary key)";
>
> PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres", "******");
> if (!pConn) FAIL;
> if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;
>
> PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
> if (pResult) PQclear(pResult);
>
> pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
> if (pResult) PQclear(pResult);
>
> pResult = PQexec(pConn, "select * from test_umlaut");
> if (!pResult) FAIL;
> if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
> if (PQnfields(pResult)!=1) FAIL;
> const char *fName = PQfname(pResult,0);
>
> ShowW("Name: ", strName);
> ShowA("in UTF8: ", ToUtf8(strName).c_str());
> ShowA("from DB: ", fName);
> ShowW("in UTF16: ", ToWide(fName).c_str());
>
> PQclear(pResult);
> PQreset(pConn);
>
> (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)
>
> And this is the output generated:
>
> Name: id_äß
> in UTF8: id_äß
> from DB: id_ã¤ãÿ
> in UTF16: id_???
>
> It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
> If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name.
> Am I missing some setup in either the database or in the use of libpq?
>
> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>
> The database uses:
> ENCODING = 'UTF8'
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
>
> Thanks for any help,
>
> Martin
>

Hi Martin,

If you do not want the lowercase behavior, you must put double-quotes around the
column name per the documentation:

http://www.postgresql.org/docs/9.2/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

section 4.1.1.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ben Zeev, Lior 2013-06-03 14:50:10 Re: PostgreSQL Process memory architecture
Previous Message Martin Schäfer 2013-06-03 14:40:14 UTF-8 encoding problem w/ libpq