From: | pg(at)kolesar(dot)hu |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #8105: names are transformed to lowercase incorrectly |
Date: | 2013-04-22 14:12:41 |
Message-ID: | E1UUHU1-0000iG-BT@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 8105
Logged by: András Kolesár
Email address: pg(at)kolesar(dot)hu
PostgreSQL version: 9.1.5
Operating system: Windows
Description:
If I specify an unicode field name without quotes, field name gets lowecased
incorrectly. pgAdmin 1.14.2 on Linux, PostgreSQL server 9.1.5 on Windows:
SELECT érték FROM (SELECT 1 AS "érték") AS x;
********** Error **********
SQL state: 42703
Character: 8
In the example above I specify an unicode column name ("érték" means "value"
in Hungarian), then I try to read it. If I use double quotes in the outer
query, it works.
However, the above example works fine if the server runs on Linux:
"PostgreSQL 9.1.9 on i686-pc-linux-gnu, compiled by gcc (Ubuntu/Linaro
4.7.2-2ubuntu1) 4.7.2, 32-bit"
I see the same problem from PHP client. There is a more verbose error
message:
ERROR: column "�rt�k" does not exist
LINE 1: SELECT érték FROM (SELECT 1 AS "érték") AS x
^
The "é" character is represented incorrectly in the error message, it shows
where the problem is. This character (U+00E9) is represented in UTF8 as C3
A9. In the error message it is an invalid UTF8 sequence: E3 A9. I think
Windows uses Windows-1250 or Windows-1252 character set where C3 lowers to
E3. A9 survives tolower() because it means © (copyright sign) in these
charsets, without lowercase pair.
I have localized the problem in PostgreSQL source:
src/backend/parser/scansup.c:128
char *
downcase_truncate_identifier(const char *ident, int len, bool warn) {
// ...
for (i = 0; i < len; i++)
// ...
if (IS_HIGHBIT_SET(ch) && isupper(ch))
ch = tolower(ch);
This function walks through identifiers byte-by-byte, lowers them if they
were individual characters. This is incorrect in multibyte character sets.
It works on Linux with UTF8 system encoding because isupper() returns false
both for C3 and A9.
The same issue is reported below:
Database object names and libpq in UTF-8 locale on Windows
http://permalink.gmane.org/gmane.comp.db.postgresql.sql/29464
Solution 1: tolower() only A-Z.
Solution 2: use a lowercase function that uses client_encoding
From | Date | Subject | |
---|---|---|---|
Next Message | ams214 | 2013-04-23 07:54:45 | BUG #8106: Redundant function definition in contrib/cube/cube.c |
Previous Message | Tom Lane | 2013-04-20 21:02:11 | Re: BUG #8095: postgres acquiring lock on a table when not in transaction |