Re: Understanding Encoding

From: Sebastien FLAESCH <sf(at)4js(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Understanding Encoding
Date: 2013-09-06 07:53:38
Message-ID: 52298A02.4060607@4js.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-sql

Hi,

Tip:

To identify what encoding you enter in the psql command interpreter:

1) Open a file with vim
2) Type in you SQL or copy/paste
3) Save the file and quit vim
4) $ file <filename>

Should give you the encoding of that text file.

For ex:

sf(at)orca:~$ echo $LC_ALL
en_US.UTF-8

sf(at)orca:~$ cat /tmp/xx
abcdefé

sf(at)orca:~$ file /tmp/xx
/tmp/xx: UTF-8 Unicode text

Seb

On 09/06/2013 09:03 AM, Tatsuo Ishii wrote:
>> Hello All,
>>
>> I am not able to understand how the encoding is handled. I would be happy
>> if someone can tell what is happening in the following scenario:
>>
>> 1. I have created a database with EUC_KR encoding and created a table and
>> inserted some korean value into it.
>>
>> =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr'
>> LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
>>
>> =# \c korean
>>
>> korean=# SHOW client_encoding;
>> client_encoding
>> -----------------
>> UTF8
>> (1 row)
>>
>> korean=# CREATE TABLE tbl (doc text);
>>
>> korean=# INSERT INTO tbl VALUES ('그레스');
>>
>>
>> 2. If I insert non-korean values it throws error:
>>
>> korean=# INSERT INTO tbl VALUES ('データベース');
>> ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has
>> no equivalent in encoding "EUC_KR"
>
> The error messages says all. PostgreSQL accepted 'データベース'
> encoded in UTF-8 then tried to convert to EUC_KR but failed, because
> EUC_KR does not accept languages other than Korean (and ASCII). What
> else did you expect?
>
>> korean=# SELECT * FROM tbl;
>> doc
>> --------
>> 그레스
>> (1 row)
>>
>>
>> 3. I change the client encoding to EUC_KR and try inserting the same korean
>> characters and it throws an error:
>>
>> korean=# SET client_encoding = 'EUC_KR';
>> SET
>> korean=# INSERT INTO tbl VALUES ('그레스');
>> ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88
>
> 0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an
> error. I gues you are using UHC (Unified Hangul Code), rather than
> EUC_KR. They are different encodings. You should do either:
>
> 1) Make sure that your termical encoding is EUC_KR.
>
> 2) set client_encoding = 'uhc';
>
>> Even the SELECT statement displays something different. I am not able to
>> understand why?
>>
>> korean=# SELECT * FROM tbl;
>> doc
>> --------
>> �׷���
>> (1 row)
>
> This is because the same reason above.
>
>> Can someone please help me.
>>
>> Thanks you,
>>
>> Beena Emerson
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Beena Emerson 2013-09-06 09:23:35 Re: Understanding Encoding
Previous Message Beena Emerson 2013-09-06 07:24:27 Re: [SQL] Understanding Encoding

Browse pgsql-sql by date

  From Date Subject
Next Message Beena Emerson 2013-09-06 09:23:35 Re: Understanding Encoding
Previous Message Beena Emerson 2013-09-06 07:24:27 Re: [SQL] Understanding Encoding