Quick Links

finding rows with invalid characters

From:	Sim Zacks <sim(at)compulab(dot)co(dot)il>
To:	PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject:	finding rows with invalid characters
Date:	2010-11-21 07:54:23
Message-ID:	4CE8D02F.8060200@compulab.co.il
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I am using PG 8.2.17 with UTF8 encoding.
"PostgreSQL 8.2.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1
(Gentoo 4.1.1)"

One of my tables somehow has invalid characters in it:
> ERROR: invalid byte sequence for encoding "UTF8": 0xa9
> HINT: This error can also happen if the byte sequence does not match
> the encoding expected by the server, which is controlled by
> "client_encoding".
I have already manually found a number of the bad rows by running
queries with text functions (upper) between groups of IDs until I found
the specific bad row.

1) Is there a quicker way to get a list of all rows with invalid characters
2) Shouldn't the database prevent these rows from being entered in the
first place?
3) I have backups of this database (using -Fc) and I noticed that on
restore, this table is not restored because of this error. Is there a
way to fix the existing backups, or tell the restore to ignore bad rows
instead of erroring out the whole table?

Thanks
Sim

Responses

Re: finding rows with invalid characters at 2010-11-21 15:55:53 from Dmitriy Igrishin
Re: finding rows with invalid characters at 2010-11-30 19:08:47 from Jasen Betts

Browse pgsql-general by date

	From	Date	Subject
Next Message	Allan Kamau	2010-11-21 09:39:49	Re: PostgreSQL 9.0 RPMs for RHEL 6 and Fedora 14 released
Previous Message	Sim Zacks	2010-11-21 05:53:17	Re: newbie question - delete before insert