Re: UTF-8 encoding problem w/ libpq

From: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF-8 encoding problem w/ libpq
Date: 2013-06-03 15:09:29
Message-ID: 11A8567A97B15648846060F5CD818EB8CAC2253F5F@DEV001EX.Dev.cadcorp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: ktm(at)rice(dot)edu [mailto:ktm(at)rice(dot)edu]
> Sent: 03 June 2013 16:48
> To: Martin Schäfer
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] UTF-8 encoding problem w/ libpq
>
> On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote:
> > I try to create database columns with umlauts, using the UTF8 client
> encoding. However, the server seems to mess up the column names. In
> particular, it seems to perform a lowercase operation on each byte of the
> UTF-8 multi-byte sequence.
> >
> > Here is my code:
> >
> > const wchar_t *strName = L"id_äß";
> > wstring strCreate = wstring(L"create table test_umlaut(") +
> > strName + L" integer primary key)";
> >
> > PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres",
> "******");
> > if (!pConn) FAIL;
> > if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;
> >
> > PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
> > if (pResult) PQclear(pResult);
> >
> > pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
> > if (pResult) PQclear(pResult);
> >
> > pResult = PQexec(pConn, "select * from test_umlaut");
> > if (!pResult) FAIL;
> > if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
> > if (PQnfields(pResult)!=1) FAIL;
> > const char *fName = PQfname(pResult,0);
> >
> > ShowW("Name: ", strName);
> > ShowA("in UTF8: ", ToUtf8(strName).c_str());
> > ShowA("from DB: ", fName);
> > ShowW("in UTF16: ", ToWide(fName).c_str());
> >
> > PQclear(pResult);
> > PQreset(pConn);
> >
> > (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use
> > WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)
> >
> > And this is the output generated:
> >
> > Name: id_äß
> > in UTF8: id_äß
> > from DB: id_ã¤ãÿ
> > in UTF16: id_???
> >
> > It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
> > If I change the strCreate query and add double quotes around the column
> name, then the problem disappears. But the original name is already in
> lowercase, so I think it should also work without quoting the column name.
> > Am I missing some setup in either the database or in the use of libpq?
> >
> > I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
> >
> > The database uses:
> > ENCODING = 'UTF8'
> > LC_COLLATE = 'English_United Kingdom.1252'
> > LC_CTYPE = 'English_United Kingdom.1252'
> >
> > Thanks for any help,
> >
> > Martin
> >
>
> Hi Martin,
>
> If you do not want the lowercase behavior, you must put double-quotes
> around the column name per the documentation:
>
> http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
> lexical.html#SQL-SYNTAX-IDENTIFIERS
>
> section 4.1.1.
>
> Regards,
> Ken

The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO.

Regards,
Martin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-06-03 15:10:52 Re: Running pgindent
Previous Message Kevin Grittner 2013-06-03 14:59:08 Re: Implicit rule created for materialized views