| From: | Victor Snezhko <snezhko(at)indorsoft(dot)ru> | 
|---|---|
| To: | pgsql-bugs(at)postgresql(dot)org | 
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Subject: | Corruption of multibyte identifiers on UTF-8 locale | 
| Date: | 2006-09-23 10:23:52 | 
| Message-ID: | u4puynao7.fsf@indorsoft.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
Hello,
Looks like we have more serious problem with multibyte identifiers.
When I run the following sequence of queries:
CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
  if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т1' AND relkind = 'r') then
    CREATE TABLE т1 (
           к1 int NOT NULL,
           PRIMARY KEY (к1)
    );
  end if;
  return 0;
END;
$$ LANGUAGE plpgsql;
SELECT CreateOrAlterTable();
CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
  if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т2' AND relkind = 'r') then
    CREATE TABLE т2 (
           к2 int NOT NULL,
           PRIMARY KEY (к2)
    );
  end if;
  return 0;
END;
$$ LANGUAGE plpgsql;
and then try to create the second table:
SELECT CreateOrAlterTable();
, this gives me the following error (on HEAD as well as patched 8.1.4):
ERROR:  invalid byte sequence for encoding "UTF8": 0xf18231
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT:  SQL statement "SELECT not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE '?1' AND relkind = 'r')"
PL/pgSQL function "createoraltertable" line 2 at if
correct utf-8 byte sequence is 0xd18231, so it looks like we call
tolower() somewhere on parts of multibyte characters, and it does the
same as isspace() - it interprets it's argument as wide character, and
converts it.
simple create tables work, as well as create tables which are called
inside a procedure without "IF EXISTS" check.
So, we either don't support utf-8 on BSDs (BTW, this needs to be
checked on less popular BSD flavors) for now, or we need to fix this
somehow. E.g., by calling only wide-character checks, which will
complicate things...
-- 
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Victor Snezhko | 2006-09-23 16:02:59 | Re: Corruption of multibyte identifiers on UTF-8 locale | 
| Previous Message | Victor Snezhko | 2006-09-23 09:59:04 | Re: BUG #1931: ILIKE and LIKE fails on Turkish locale |