From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | alvherre(at)commandprompt(dot)com |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql(at)j-davis(dot)com, ishii(at)postgresql(dot)org, andrew(at)dunslane(dot)net, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: invalidly encoded strings |
Date: | 2007-09-12 03:43:56 |
Message-ID: | 20070912.124356.41631070.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
> However ISTM we would also need something like
>
> length(bytea, name) returns int
> -- counts the number of characters assuming that the bytea is in
> -- the given encoding
>
> Hmm, I wonder if counting chars is consistent regardless of the
> encoding the string is in. To me it sounds like it should, in which
> case it works to convert to the DB encoding and count chars there.
Not necessarily.
It's possible that after encoding conversion, number of chars are
different before and after. An example is, UTF-8 and EUC_JIS_2004.
0xa4f7(EUC_JIS_2004) <--> U+304B *and* U+309A (Unicode)
This is defined in the Japanese goverment's standard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-09-12 04:04:31 | Re: CVS HEAD is broken by flex |
Previous Message | Brendan Jurd | 2007-09-12 03:29:31 | Re: Per-function search_path => per-function GUC settings |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2007-09-12 06:49:31 | Re: HOT documentation README |
Previous Message | Andrew Dunstan | 2007-09-12 03:27:12 | Re: prevent invalidly encoded input |