Quick Links

Hex characters in COPY input

From:	Melvin Call <melvincall979(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Hex characters in COPY input
Date:	2015-02-27 02:50:30
Message-ID:	CADGQN55eBFOWJwzcJPdpTHV-MGwmh+W-XgeirE33qrn1p0TFow@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Good evening list,

I am taking the output of a MySQL query and saving it into a text file for COPY
input into a PostgreSQL database. The query gives me a list of addresses. One of
the addresses is in Montreal, and was input using the correct spelling of
Montreal where the e is an accented e. The output ends up in the text file as
Montr\xe9al, where the xe9 is a single character. When I try to copy that into
my PostgreSQL table, I get an error "ERROR: invalid byte sequence for encoding
"UTF8": 0xe9616c", which makes sense since the hex character has not been
sanitized.

Now if I run the output through sed first, and substitute the \xe9 character for
something, say the word TEST, I end up with MontrTESTal in my input file, and
naturally that imports just fine. So this tells me that I can perform a
substitute on the hex character. But I have been pulling my hair out trying to
figure out how to substitute in a properly escaped representation of the
accented e. For instance, this: s/\(\xe9\)/U\&'\1'/g gives me MontrU&'\xe9'al
in my input file, but that just causes a different invalid byte sequence error.

So my question is, how do I sanitize the hex character in the middle of a word
to be able to copy in Montreal with an accented e? Or am I going about this at
the wrong point?

Thanks,
Melvin

Responses

Re: Hex characters in COPY input at 2015-02-27 02:54:03 from Vick Khera
Re: Hex characters in COPY input at 2015-02-27 14:41:25 from Melvin Call

Browse pgsql-general by date

	From	Date	Subject
Next Message	Vick Khera	2015-02-27 02:54:03	Re: Hex characters in COPY input
Previous Message	Semyon Reyfman	2015-02-27 02:42:42	Re: ANALYZE after CREATE TABLE AS SELECT...