Re: MSSQL to PostgreSQL : Encoding problem

From: Richard Huxton <dev(at)archonet(dot)com>
To: Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: MSSQL to PostgreSQL : Encoding problem
Date: 2006-11-22 01:30:15
Message-ID: 4563A827.8080809@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tony Caduto wrote:
> Arnaud Lesauvage wrote:
>>
>>
>>> I then try to import into PostgreSQL. The farther I can get is when
>>> using the UNICODE export, and importing it using a client_encoding
>>> set to UTF8 (I tried WIN1252, LATIN9, LATIN1, ...).
>>> The copy then stops with an error :
>>> ERROR: invalid byte sequence for encoding "UTF8": 0xff
>>> État SQL :22021
>>>
>>> The problematic character is the euro currency symbol.
>>
>>
> Exporting from MS SQL server as unicode is going to give you full
> Unicode, not UTF8. Full unicde is 2 bytes per character and UTF8 is 1,
> same as ASCII.
> You will have to encode the Unicode data to UTF8

Well, UTF8 is a minimum of one byte, but can be longer for non-ASCII
characters. The idea being that chars below 128 map to ASCII. There's
also UTF16 and I believe UTF32 with 2+ and four byte characters.

> I have done this in Delphi using it's built in UTF8 encoding and
> decoding routines. You can get a free copy of Delphi Turbo Explorer
> which includes components for MS SQL server and ODBC, so it would be
> pretty straight forward to get this working.
>
> The actual method in Delphi is system.UTF8Encode(widestring). This will
> encode unicode to UTF8 which is compatible with a Postgresql UTF8 database.

Ah, that's useful to know. Windows just doesn't have the same quantity
of tools installed as a *nix platform.

> I am sure Perl could do it also.

And in one line if you're clever enough no doubt ;-)

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vic Cekvenich 2006-11-22 02:52:50 sql expert full time help wanted
Previous Message Tony Caduto 2006-11-21 23:30:50 Re: MSSQL to PostgreSQL : Encoding problem