From: | Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> |
---|---|
To: | Howard Cole <howardnews(at)selestial(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org List" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to remove non-UTF values from a table? |
Date: | 2009-12-16 01:13:29 |
Message-ID: | e373d31e0912151713x4697a862j97d8f7d8d8a1c20a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Dec 15, 2009 at 9:26 PM, Howard Cole <howardnews(at)selestial(dot)com> wrote:
> Phoenix Kiula wrote:
>>
>> An easy question for some I hope.
>>
>> I have a DB from 8.2 days that when I now dump and try to take into
>> the 8.3.7, it gives me errors about utf-8 stuff.
>>
>> I tried searching this list's archives but could not come up with an
>> answer.
>>
>> Google returns some sites like these:
>> http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
>> but I'm not clear on how to use them.
>>
>> Following the SQL on this site I could identify some columns that
>> contain text like this:
>>
>> "Évolution générale de la situation démographique"
>>
>> So my guess is that the non-English characters were originally not
>> getting written in proper utf-8 variants.
>>
>> Is there any SQL possibility to find these columns and replace them
>> with utf-8 equivalents using some postgresql commands? Couldn't find
>> anything in the "Strings functions" (chapter 9 of manual).
>>
>> We're on CentOS.
>>
>> Thanks!
>>
>>
>
> My recommendation would be to install the iconv utility and run it on a
> plain text (pg_dump -Fp) backup as suggested in the google article - and
> then reimport the clean UTF-8.
>
> I am surprised that you managed to install the original backup on 8.3
> because it seems to be much more strict on encoding - Unless your database
> is not in UTF-8?
Thanks Howard.
I ran the SQL and it finds anything that has non-English characters.
For example:
Part of this URL is actually in Japanese, but when I paste it in this
email it comes up with all these percentage signs. I suppose this is
"url encoded".
Shouldn't this be valid UTF-8? How does PG calculate if something is
not valid UTF-8?
Thanks.
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2009-12-16 02:34:50 | Re: Possible causes for database corruption and solutions |
Previous Message | Scott Marlowe | 2009-12-16 01:07:48 | Re: Possible causes for database corruption and solutions |