From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, kleptog(at)svana(dot)org, pgsql(at)markdilger(dot)com, all(at)adv(dot)magwien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bug in UTF8-Validation Code? |
Date: | 2007-04-04 15:50:32 |
Message-ID: | 20070404155032.GH8549@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii wrote:
> BTW, every encoding has its own charset. However the relationship
> between encoding and charset are not so simple as Unicode. For
> example, encoding EUC_JP correponds to multiple charsets, namely
> ASCII, JIS X 0201, JIS X 0208 and JIS X 0212. So a function which
> returns a "code point" is not quite usefull since it lacks the charset
> info. I think we need to continute design discussion, probably
> targetting for 8.4, not 8.3.
Is Unicode complete as far as Japanese chars go? I mean, is there a
character in EUC_JP that is not representable in Unicode?
Because if Unicode is complete, ISTM it makes perfect sense to have a
unicode_char() (or whatever we end up calling it) that takes an Unicode
code point and returns a character in whatever JIS set you want
(specified by setting client_encoding to that). Because then you solved
the problem nicely.
One thing that I find confusing in your text above is whether EUC_JP is
an encoding or a charset? I would think that the various JIS X are
encodings, and EUC_JP is the charset; or is it the other way around?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2007-04-04 15:56:50 | Re: Bug in UTF8-Validation Code? |
Previous Message | Mark Dilger | 2007-04-04 15:41:19 | Re: Bug in UTF8-Validation Code? |