Re: MSSQL to PostgreSQL : Encoding problem

From: Arnaud Lesauvage <thewild(at)freesurf(dot)fr>
To: Arnaud Lesauvage <thewild(at)freesurf(dot)fr>, Tomi NA <hefest(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: MSSQL to PostgreSQL : Encoding problem
Date: 2006-11-22 14:34:34
Message-ID: 45645FFA.2040006@freesurf.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Alvaro Herrera a écrit :
>> >Arnaud Lesauvage wrote:
>> >>Tomi NA a écrit :
>> >>>>I think I'll go this way... No other choice, actually !
>> >>>>The MSSQL database is in SQL_Latin1_General_CP1_Cl_AS.
>> >>>>I don't really understand what this is. It supports the euro
>> >>>>symbol, so it is probably not pure LATIN1, right ?
>> >>>
>> >>>I suppose you'd have to look at the latin1 codepage character table
>> >>>somewhere...I'm a UTF-8 guy so I'm not well suited to respond to the
>> >>>question. :)
>> >>
>> >>Yep, http://en.wikipedia.org/wiki/Latin-1 tells me that
>> >>LATIN1 is missing the euro sign...
>> >>Grrrrr I hate this !!!
>> >
>> >So use Latin9 ...
>>
>> Of course, but it doesn't work !!!
>> Whatever client encoding I choose in postgresql before
>> COPYing, I get the 'invalid byte sequence error'.
>
> Humm ... how are you choosing the client encoding? Is it actually
> working? I don't see how choosing Latin1 or Latin9 and feeding whatever
> byte sequence would give you an "invalid byte sequence". These charsets
> don't have any way to validate the bytes, as opposed to what UTF-8 can
> do. So you could end up with invalid bytes if you choose the wrong
> client encoding, but that's a different error.
>

mydb=# SET client_encoding TO LATIN9;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
ERROR: invalid byte sequence for encoding "LATIN9": 0x00
HINT: This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT: COPY detailrecherche, line 9212
mydb=# SET client_encoding TO WIN1252;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
ERROR: invalid byte sequence for encoding "WIN1252": 0x00
HINT: This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT: COPY detailrecherche, line 9212

Really, I'd rather have another error, but this is all I can
get.
This is with the "ANSI" export.
With the "UNICODE" export :

mydb=# SET client_encoding TO UTF8;
SET
mydb=# COPY statistiques.detailrecherche (log_gid,
champrecherche, valeurrecherche) FROM
'E:\\Production\\Temp\\detailrecherche_unicode.csv' CSV;
ERROR: invalid byte sequence for encoding "UTF8": 0xff
HINT: This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".
CONTEXT: COPY detailrecherche, line 592680

So, line 592680 is *a lot* better, but it is still not good!

--
Arnaud

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas H. 2006-11-22 14:36:03 Re: MSSQL to PostgreSQL : Encoding problem
Previous Message Arnaud Lesauvage 2006-11-22 14:34:19 Re: MSSQL to PostgreSQL : Encoding problem