From: | Victor Snezhko <snezhko(at)indorsoft(dot)ru> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Corruption of multibyte identifiers on UTF-8 locale |
Date: | 2006-09-23 10:23:52 |
Message-ID: | u4puynao7.fsf@indorsoft.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
Looks like we have more serious problem with multibyte identifiers.
When I run the following sequence of queries:
CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т1' AND relkind = 'r') then
CREATE TABLE т1 (
к1 int NOT NULL,
PRIMARY KEY (к1)
);
end if;
return 0;
END;
$$ LANGUAGE plpgsql;
SELECT CreateOrAlterTable();
CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т2' AND relkind = 'r') then
CREATE TABLE т2 (
к2 int NOT NULL,
PRIMARY KEY (к2)
);
end if;
return 0;
END;
$$ LANGUAGE plpgsql;
and then try to create the second table:
SELECT CreateOrAlterTable();
, this gives me the following error (on HEAD as well as patched 8.1.4):
ERROR: invalid byte sequence for encoding "UTF8": 0xf18231
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: SQL statement "SELECT not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE '?1' AND relkind = 'r')"
PL/pgSQL function "createoraltertable" line 2 at if
correct utf-8 byte sequence is 0xd18231, so it looks like we call
tolower() somewhere on parts of multibyte characters, and it does the
same as isspace() - it interprets it's argument as wide character, and
converts it.
simple create tables work, as well as create tables which are called
inside a procedure without "IF EXISTS" check.
So, we either don't support utf-8 on BSDs (BTW, this needs to be
checked on less popular BSD flavors) for now, or we need to fix this
somehow. E.g., by calling only wide-character checks, which will
complicate things...
--
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru
From | Date | Subject | |
---|---|---|---|
Next Message | Victor Snezhko | 2006-09-23 16:02:59 | Re: Corruption of multibyte identifiers on UTF-8 locale |
Previous Message | Victor Snezhko | 2006-09-23 09:59:04 | Re: BUG #1931: ILIKE and LIKE fails on Turkish locale |