Quick Links

Re: Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Andreas <maps(dot)on(at)gmx(dot)net>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1
Date:	2010-04-29 04:35:55
Message-ID:	4BD90CAB.5030609@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 29/04/10 11:02, Andreas wrote:
> Hi,
>
> I've got an 8.4.3 Unicode DB that accidentally holds a few records with
> characters that can't be converted to Latin1 or 9 for output to CSV.
>
> I'd just need a way to check if a collumn contains values that CAN NOT
> be converted from Utf8 to Latin1 to select all those affected records.
>
> I tried:
> Select convert_to (my_column::text, 'LATIN1') from my_table;
>
> It raises an error that says translated:
> ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1«
> Regrettably it doesn't explain where it found this sign.

I'd use a PL/PgSQL procedure to step through the result of a FOR IN
SELECT, running each test in an exception handling block.

> Select '\xe28093'
> complains that this weren't a valid UTF8 code at all.
> So how was it accepted and stored in the first place?

Because that's the escape E'\xe2" followed by the literal characters
"8093". "\xe2" followed by those characters isn't legal utf-8.

I'm pretty sure the error actually refers to:

select E'\xe2\x80\x93';

which is the character "–" (U+2013 EN DASH).

Yes, it'd be nice if PostgreSQL's error message was in syntax that
PostgreSQL understood, not pseudo-C-style hex literal form.

--
Craig Ringer

In response to

Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1 at 2010-04-29 03:02:58 from Andreas

Browse pgsql-general by date

	From	Date	Subject
Next Message	Piotr Kublicki	2010-04-29 07:36:22	Re: pg_hba.conf
Previous Message	Andreas	2010-04-29 03:02:58	Need magic that checks if a collumn contains values that CAN NOT be converted to Latin1