Quick Links

Re: invalidly encoded strings

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	alvherre(at)commandprompt(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql(at)j-davis(dot)com, ishii(at)postgresql(dot)org, andrew(at)dunslane(dot)net, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: invalidly encoded strings
Date:	2007-09-12 03:43:56
Message-ID:	20070912.124356.41631070.t-ishii@sraoss.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

> However ISTM we would also need something like
>
> length(bytea, name) returns int
> -- counts the number of characters assuming that the bytea is in
> -- the given encoding
>
> Hmm, I wonder if counting chars is consistent regardless of the
> encoding the string is in. To me it sounds like it should, in which
> case it works to convert to the DB encoding and count chars there.

Not necessarily.

It's possible that after encoding conversion, number of chars are
different before and after. An example is, UTF-8 and EUC_JIS_2004.

0xa4f7(EUC_JIS_2004) <--> U+304B *and* U+309A (Unicode)

This is defined in the Japanese goverment's standard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Re: invalidly encoded strings at 2007-09-11 19:26:42 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-12 04:04:31	Re: CVS HEAD is broken by flex
Previous Message	Brendan Jurd	2007-09-12 03:29:31	Re: Per-function search_path => per-function GUC settings

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Pavan Deolasee	2007-09-12 06:49:31	Re: HOT documentation README
Previous Message	Andrew Dunstan	2007-09-12 03:27:12	Re: prevent invalidly encoded input