From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | henka(at)cityweb(dot)co(dot)za |
Cc: | "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Locale/encoding problem/question |
Date: | 2006-08-04 12:59:51 |
Message-ID: | 5345.1154696391@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
henka(at)cityweb(dot)co(dot)za writes:
>> It should be in the dump file, almost the first line. Locale is of no
>> interest to pg_dump, you'll have to decide how you want it.
> Yes: UTF-8 and the other is LATIN1
Note that this represents what the original server *thought* the
encoding was. But it's not at all impossible that the server thought
the data was LATIN1 when it was really UTF8. (The other way around is
less plausible because the server would have been able to detect
encoding errors.) If you were using clients that treated the data
as UTF8 without paying attention to what the server thought, you'd
not have realized you were mislabeling the data.
But, if you tried to load data marked as LATIN1 into a server using
UTF8, it'd have applied a LATIN1 to UTF8 conversion, and then
everything's hosed.
I'd suggest actually inspecting the data in the dump file: it's not that
hard to tell UTF8 from LATIN1 if you look at the byte sequences.
Or you could just take the file marked LATIN1, edit it to change the
client_encoding setting to say the data is UTF8, and see if you can
load it. If it's not UTF8, 8.1.4 will almost certainly detect a ton of
encoding errors.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2006-08-04 13:29:50 | Re: Best Procedural Language? |
Previous Message | Q Beukes | 2006-08-04 12:33:24 | pg_dump sequence problem |