From: | Steven Schlansker <steven(at)trumpet(dot)io> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Date: | 2010-08-18 23:11:31 |
Message-ID: | 8F72262C-5694-4626-A87F-00604FB5E1D6@trumpet.io |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hello fine PostgreSQL bug-busters,
I'm having a rather annoying problem - a particular string is causing the Postgres COPY functionality to lose a byte, causing data corruption in backups and transferred data.
First, the environment -
PostgreSQL 8.4.4 on i386-apple-darwin10.3.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit
Mac OS X 10.6.4
[steven(at)xxx:~]% psql --version
psql (PostgreSQL) 8.4.4
contains support for command-line editing
Now, the setup:
Name | Owner | Encoding | Collation | Ctype | Access privileges | Size | Tablespace | Description
baddb | xxxxxxx_production | UTF8 | en_US.utf-8 | en_US.utf-8 | | 207 MB | pg_default |
baddb=> create table badtable (a int, b int, c character varying, d character varying, e character varying, f character varying[], g text, h character varying[],i character varying[], j character varying[], k character varying[], l character varying[], m character varying[], n character varying[],o character varying, p character varying);
baddb=> \copy badtable from '/tmp/data.copy'
baddb=> \copy badtable to '/tmp/badness.copy'
baddb=> \copy badtable from '/tmp/badness.copy'
ERROR: invalid byte sequence for encoding "UTF8": 0xcf2c
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY badtable, line 1
Obviously, this wouldn't be too helpful without the datafile in question:
1 2377510 FOURSQUARE 1403504 Pizza Hut {} \N {} {} {} {Pizza} {πίτσα,hut,food,ζωγράφου,pizza,eat,zografou} {} \N \N \N
Since this is likely to be eaten by various mail clients or lost in translation, please find attached a TGZ of the data file in question.
Attachment | Content-Type | Size |
---|---|---|
data.tgz | application/octet-stream | 260 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-08-19 01:27:48 | Re: BUG #5622: Query failed: server closed the connection unexpectedly |
Previous Message | Albert Ullrich | 2010-08-18 23:07:14 | BUG #5626: Parallel pg_restore fails with "tuple concurrently updated" |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2010-08-18 23:46:08 | Re: CommitFest 2009-07: Yay, Kevin! Thanks, reviewers! |
Previous Message | Kevin Grittner | 2010-08-18 21:45:47 | CommitFest 2010-07 final report |