From: | Mark Dilger <pgsql(at)markdilger(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)commandprompt(dot)com, kleptog(at)svana(dot)org, all(at)adv(dot)magwien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bug in UTF8-Validation Code? |
Date: | 2007-04-04 15:56:50 |
Message-ID: | 4613CAC2.30807@markdilger.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii wrote:
> <SNIP>. I think we need to continute design discussion, probably
> targetting for 8.4, not 8.3.
The discussion came about because Andrew - Supernews noticed that chr()
returns invalid utf8, and we're trying to fix all the bugs with invalid
utf8 in the system. Something needs to be done, even if we just check
the result of the current chr() implementation and throw an error on
invalid results. But do we want to make this minor change for 8.3 and
then change it again for 8.4?
Here's an example of the current problem. It's an 8.2.3 database with
utf8.en_US encoding
mark=# create table testutf8 (t text);
CREATE TABLE
mark=# insert into testutf8 (t) (select chr(gs) from
generate_series(0,255) as gs);
INSERT 0 256
mark=# \copy testutf8 to testutf8.data
mark=# truncate testutf8;
TRUNCATE TABLE
mark=# \copy testutf8 from testutf8.data
ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
CONTEXT: COPY testutf8, line 129
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2007-04-04 15:57:57 | Re: Bug in UTF8-Validation Code? |
Previous Message | Alvaro Herrera | 2007-04-04 15:50:32 | Re: Bug in UTF8-Validation Code? |