Re: error while trying to change the database encoding on a database

From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Geoffrey Myers <lists(at)serioustechnology(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: error while trying to change the database encoding on a database
Date: 2011-01-24 18:40:36
Message-ID: 4D3DC7A4.9090707@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01/24/2011 09:16 AM, Geoffrey Myers wrote:

>
> We hope to identify the characters and fix them in the existing
> database, then convert. It appears to be very limited, but it would help
> if there was some way to identify these characters outside of simply
> doing the reload of the data and finding the errors.
>
> Hence the reason I asked about a resource that might identify the
> characters.

The problem is that from the standpoint of the SQL_ASCII database there
is nothing wrong with the characters per se. AFAIK there is no built in
function to validate characters. The reason is that valid is determined
by the encoding and if you know the encoding then you really don't need
to determine validity. If you want to see one way others have tackled
this, search on iconv in the mailing list archive. This requires working
on an external copy of the data and knowing something about the
encodings involved. The nearest I could ever find to an encoding
detector is:

http://chardet.feedparser.org/

It is a Python program and the encodings it detects are limited but it
might work for you.

Given all the above, when I was faced with the problem you are facing I
found it easiest to make an educated guess as to the original encoding
and then do test restores with client_encoding set to my guess.

>
>>
>>
>>> --
>>> Until later, Geoffrey

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fredric Fredricson 2011-01-24 18:49:01 Re: (Hopefully stupid) select question.
Previous Message Martijn van Oosterhout 2011-01-24 17:53:53 Re: error while trying to change the database encoding on a database