Quick Links

Bad Data back Door

From:	"David E(dot) Wheeler" <david(at)justatheory(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Bad Data back Door
Date:	2012-10-06 00:54:55
Message-ID:	A9E454DD-DB0A-4D1F-A6E6-888ED4FCFE4F@justatheory.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hackers,

I’ve discovered something a bit disturbing at $work. We’re migrating (slowly) from Oracle to PostgreSQL, and in some cases are using oracle_fdw to copy data over. Alas, there are a fair number of text values in the Oracle database that, although the database is UTF-8, are actually something else (CP1252 or Latin1). When we copy from an oracle_fdw foreign table into a PostgreSQL table, PostgreSQL does not complain, but ends up storing the mis-encoded strings, even though the database is UTF-8.

I assume that this is because the foreign table, as a table, is assumed by the system to have valid data, and therefor additional character encoding validation is skipped, yes?

If so, I’m wondering if it might be possible to add some sort of option to the CREATE FOREIGN TABLE statement to the effect that certain values should not be trusted to be in the encoding they say they are.

At any rate, I’m spending some quality time re-encoding bogus values I never expected to see in our systems. :-(

Thanks,

David

Responses

Re: Bad Data back Door at 2012-10-06 01:12:01 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2012-10-06 01:00:19	why repl_gram.h?
Previous Message	Tom Lane	2012-10-05 23:56:40	Bugs in CREATE/DROP INDEX CONCURRENTLY