Re: Understanding Encoding

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: memissemerson(at)gmail(dot)com
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-novice(at)postgresql(dot)org
Subject: Re: Understanding Encoding
Date: 2013-09-06 07:03:12
Message-ID: 20130906.160312.318878479653679358.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-sql

> Hello All,
>
> I am not able to understand how the encoding is handled. I would be happy
> if someone can tell what is happening in the following scenario:
>
> 1. I have created a database with EUC_KR encoding and created a table and
> inserted some korean value into it.
>
> =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr'
> LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
>
> =# \c korean
>
> korean=# SHOW client_encoding;
> client_encoding
> -----------------
> UTF8
> (1 row)
>
> korean=# CREATE TABLE tbl (doc text);
>
> korean=# INSERT INTO tbl VALUES ('그레스');
>
>
> 2. If I insert non-korean values it throws error:
>
> korean=# INSERT INTO tbl VALUES ('データベース');
> ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has
> no equivalent in encoding "EUC_KR"

The error messages says all. PostgreSQL accepted 'データベース'
encoded in UTF-8 then tried to convert to EUC_KR but failed, because
EUC_KR does not accept languages other than Korean (and ASCII). What
else did you expect?

> korean=# SELECT * FROM tbl;
> doc
> --------
> 그레스
> (1 row)
>
>
> 3. I change the client encoding to EUC_KR and try inserting the same korean
> characters and it throws an error:
>
> korean=# SET client_encoding = 'EUC_KR';
> SET
> korean=# INSERT INTO tbl VALUES ('그레스');
> ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88

0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an
error. I gues you are using UHC (Unified Hangul Code), rather than
EUC_KR. They are different encodings. You should do either:

1) Make sure that your termical encoding is EUC_KR.

2) set client_encoding = 'uhc';

> Even the SELECT statement displays something different. I am not able to
> understand why?
>
> korean=# SELECT * FROM tbl;
> doc
> --------
> �׷���
> (1 row)

This is because the same reason above.

> Can someone please help me.
>
> Thanks you,
>
> Beena Emerson
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Amit Langote 2013-09-06 07:14:18 Re: [NOVICE] Understanding Encoding
Previous Message Tom Lane 2013-09-06 06:59:29 Re: [SQL] Understanding Encoding

Browse pgsql-sql by date

  From Date Subject
Next Message Amit Langote 2013-09-06 07:14:18 Re: [NOVICE] Understanding Encoding
Previous Message Tom Lane 2013-09-06 06:59:29 Re: [SQL] Understanding Encoding