Quick Links

Re: Trouble with UTF-8 data

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Janine Sisk EXTERN" <janine(at)furfly(dot)net>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: Trouble with UTF-8 data
Date:	2008-01-21 08:15:44
Message-ID:	D960CB61B694CF459DCFB4B0128514C2CC2A65@exadv11.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Janine Sisk wrote:
>> 0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which,
>> when interpreted as a high surrogare and followed by a low surrogate,
>> would correspond to the UTF-16 encoding of a code point
>> between 0x88400 and 0x887FF (depending on the value of the low surrogate).
>>
>> These code points do not correspond to any valid character.
>> So - unless there is a flaw in my reasoning - there's something
>> fishy with these data anyway.
>>
>> Janine, could you give us a hex dump of that line from the copy statement?
>
> Certainly. Do you want to see it as it came from the old database,
> or after I ran it through iconv? Although iconv wasn't able to solve
> this problem it did fix others in other tables; unfortunately I have
> no way of knowing if it also mangled some data at the same time.

Both; but the "before" dump is of course more likely to give a clue.

Yours,
Laurenz Albe

In response to

Re: Trouble with UTF-8 data at 2008-01-18 18:09:04 from Janine Sisk

Browse pgsql-general by date

	From	Date	Subject
Next Message	Martijn van Oosterhout	2008-01-21 08:16:55	Re: Sun acquires MySQL
Previous Message	Pavel Stehule	2008-01-21 08:10:43	Re: Sun acquires MySQL