From: | Mathijs Brands <mathijs(at)ilse(dot)net> |
---|---|
To: | pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Encoding conversions in psql |
Date: | 2004-01-08 14:21:19 |
Message-ID: | 20040108142119.GA13264@ilse.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Howdy,
Can anyone explain to me when psql tries to convert between encodings?
It seems to disregard encodings set with SET CLIENT_ENCODING.
The following reproduces the behaviour I'm seeing:
1. create an UNICODE database
2. run the following:
set client_encoding to latin1;
create table bla(a text);
insert into bla values('meëep');
3. try the following from psql:
Welcome to psql 7.3.4, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
mathijs=# select * from bla;
a
-------
meëep
(1 row)
mathijs=# set client_encoding = latin1;
SET
mathijs=# select * from bla;
a
------
meep
(1 row)
mathijs=# \encoding latin1
mathijs=# select * from bla;
a
-------
meëep
(1 row)
After setting CLIENT_ENCODING, the middle character gets dropped. To me
it seems like psql is considering the data it gets from the server as
UTF8, tries to interpret it as UTF8, sees the ë (which is indeed an
invalid UTF8 character) and drops it.
My question is: why does psql seem to think it's receiving UTF8 data
-after- I've changed the client_encoding. I've checked with a network
sniffer that results returned with or without using \encoding (as
expected) are the same. Is this behaviour a bug? If not, it does not
seem very obvious to me; I would expect psql to keep track of the
encoding set between the server and the client.
Cheers,
Mathijs
From | Date | Subject | |
---|---|---|---|
Next Message | Shachar Shemesh | 2004-01-08 20:04:56 | OLE DB driver |
Previous Message | Stephen Frost | 2004-01-08 13:18:15 | Segfault in 7.4.1 (and 7.3.4) during vacuum analyze |