Re: [ADMIN] what's the efficient/safest way to convert database character set ?

From: Steve Atkins <steve(at)blighty(dot)com>
To: "pgsql-general(at)postgresql(dot)org General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Date: 2013-10-18 00:07:33
Message-ID: 6E5BC504-F62C-4988-938E-9F86CBCA626D@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Oct 17, 2013, at 3:13 PM, "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com> wrote:

> Hi,
>
> I’ve got a question of converting database from ascii to UTF-8, what’s the best approach to do so if the database size is very large? Detailed procedure or experience sharing are much appreciated!
>

The answer to that depends on what you mean by "ascii".

If your current database uses SQL_ASCII encoding - that's not ascii. It could have anything in there, including any mix of encodings and there's been no enforcement of any encoding, so there's no way of knowing what they are. If you've had, for example, webapps that let people paste word documents into them, you potentially have different encodings used in different rows of the same table.

If your current data is like that then you're probably looking at doing some (manual) data cleanup to work out what encoding your data is really in, and converting it to something consistent rather than a simple migration from ascii to utf8.

Cheers,
Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2013-10-18 00:23:20 Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Previous Message Ian Lawrence Barwick 2013-10-17 22:40:47 Re: Index creation fails with automatic names