Re: Best practices for moving UTF8 databases

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org, Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
Subject: Re: Best practices for moving UTF8 databases
Date: 2009-07-19 02:16:17
Message-ID: e373d31e0907181916s7be46a45mcac18b91df6f367e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jul 14, 2009 at 9:52 PM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
> Andres Freund wrote:
>> On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote:
>
>> > if you do an ascii dump and the dump starts out "SET CLIENT ENCODING
>> > 'UTF8'" or similar but you still get errors.
>> Do you mean that a dump from SQL_ASCII can yield non-utf8 data? right. But
>> According to the OP his 8.3 database is UTF8...
>> So there should not be invalid data in there.
>
> I haven't followed this thread, but older PG versions had less strict
> checks on UTF8 data, which meant that some invalid data could creep in.

If so, how can I check for them in my old database, which is 8.2.9?
I'm now moving first to 8.3 (then to the 84).

Really, PG absolutely needs a way to upgrade the database without so
much data related downtime and all these silly woes. Several competing
database systems are a cinch to upgrade.

Anyway this is the annoying error I see as always:

ERROR: invalid byte sequence for encoding "UTF8": 0x80

I think my old DB is all utf8. If there are a few characters that are
not, how can I work with this? I've done everything I can to take care
of the encoding and such. This code was used to initdb:

initdb --locale=en_US.UTF-8 --encoding=UTF8

Locale environment variables are all "en_US.UTF-8" too.

Thanks for any pointers!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2009-07-19 03:12:20 Re: initdb fails on Windows with encoding=LATIN1
Previous Message Diego Schulz 2009-07-18 21:32:15 Re: initdb fails on Windows with encoding=LATIN1