Quick Links

Re: invalidly encoded strings

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, andrew(at)dunslane(dot)net, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: invalidly encoded strings
Date:	2007-09-11 19:31:15
Message-ID:	17382.1189539075@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> I think really the technically cleanest solution would be to make
>> convert() return bytea instead of text; then we'd not have to put
>> restrictions on what encoding or locale it's working inside of.
>> However, it's not clear to me whether there are valid usages that
>> that would foreclose. Tatsuo mentioned length() but bytea has that.

> But length(bytea) cannot count characters, only bytes.

So what? If you want characters, just count the original text string.
Encoding conversion won't change that.

> Hmm, I wonder if counting chars is consistent regardless of the
> encoding the string is in. To me it sounds like it should, in which
> case it works to convert to the DB encoding and count chars there.

A conversion that isn't one-for-one is not merely an encoding conversion
IMHO.

regards, tom lane

In response to

Re: invalidly encoded strings at 2007-09-11 19:26:42 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-11 19:41:54	Re: pg_dump and money type
Previous Message	Alvaro Herrera	2007-09-11 19:26:42	Re: invalidly encoded strings

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Marshall, Steve	2007-09-11 19:43:10	PL/TCL Patch to prevent postgres from becoming multithreaded
Previous Message	Alvaro Herrera	2007-09-11 19:26:42	Re: invalidly encoded strings