From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | ishii(at)postgresql(dot)org, andrew(at)dunslane(dot)net, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: invalidly encoded strings |
Date: | 2007-09-11 02:27:50 |
Message-ID: | 20070911.112750.70199461.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
> Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> > If you regard the unicode code point as simply a number, why not
> > regard the multibyte characters as a number too?
>
> Because there's a standard specifying the Unicode code points *as
> numbers*. The mapping from those numbers to UTF8 strings (and other
> representations) is well-defined by the standard.
>
> > Also I'm wondering you what we should do with different
> > backend/frontend encoding combo.
>
> Nothing. chr() has always worked with reference to the database
> encoding, and we should keep it that way.
Where is it documented?
> BTW, it strikes me that there is another hole that we need to plug in
> this area, and that's the convert() function. Being able to create
> a value of type text that is not in the database encoding is simply
> broken. Perhaps we could make it work on bytea instead (providing
> a cast from text to bytea but not vice versa), or maybe we should just
> forbid the whole thing if the database encoding isn't SQL_ASCII.
Please don't do that. It will break an usefull use case of convert().
A user has a database encoded in UTF-8. He has English, French,
Chinese and Japanese data in tables. To sort the tables in the
language order, he will do like this:
SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp);
Without using convert(), he will get random order of data. This is
because Kanji characters are in random order in UTF-8, while Kanji
characters are reasonably ordered in EUC_JP.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-09-11 02:35:59 | Re: "txn" in pg_stat_activity |
Previous Message | Andrew Dunstan | 2007-09-11 01:57:43 | Re: invalidly encoded strings |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-09-11 02:35:26 | Re: HOT patch - version 15 |
Previous Message | Tom Lane | 2007-09-11 02:24:33 | Re: HOT patch - version 15 |