Re: Why don't I get a LATIN1 encoding here with SET ENCODING?

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Bryce Nesbitt <bryce2(at)obviously(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Date: 2009-11-04 02:29:35
Message-ID: 4AF0E70F.1030201@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Bryce Nesbitt wrote:
> I'm tracking another bug, but wanted to verify stuff on the command line. I
> can't figure out why this did not work:

> dblack3-deleteme=> insert into bryce1 values(1,2,'test\375');
> ERROR: invalid byte sequence for encoding "UTF8": 0xfd

I'd say the server is interpreting your query text as latin-1 and
converting it to the server encoding UTF-8 as it should, resulting in
the utf-8 string:

insert into bryce1 values(1,2,'test\375');

which it *then* interprets escapes in. As test\xfd ('0x74 0x65 0x73 0x74
0xfd') isn't valid UTF-8, the server rejects it.

If my understanding is right then the trouble is that the
client_encoding setting doesn't affect string escapes in SQL queries.
The conversion of the query text from client to server encoding is done
before string escapes are processed.

In truth, that's how I'd expect it to happen. If I ask for the byte 0xfd
in a string, I don't want the server to decide that I must've meant
something else because I have a different client encoding. If I wanted
encoding conversion, I wouldn't have written it in an escape form, I'd
have written 'ý' not '\375'.

--
Craig Ringer

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Bryce Nesbitt 2009-11-04 03:36:07 Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Previous Message Bryce Nesbitt 2009-11-04 01:13:21 Why don't I get a LATIN1 encoding here with SET ENCODING?