From: | Janine Sisk <janine(at)furfly(dot)net> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Trouble with UTF-8 data |
Date: | 2008-01-18 18:09:04 |
Message-ID: | 305D0D29-63FE-4EA3-8524-039B9E69B884@furfly.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Jan 18, 2008, at 12:00 AM, Albe Laurenz wrote:
> 0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which,
> when interpreted as a high surrogare and followed by a low surrogate,
> would correspond to the UTF-16 encoding of a code point
> between 0x88400 and 0x887FF (depending on the value of the low
> surrogate).
>
> These code points do not correspond to any valid character.
> So - unless there is a flaw in my reasoning - there's something
> fishy with these data anyway.
>
> Janine, could you give us a hex dump of that line from the copy
> statement?
Certainly. Do you want to see it as it came from the old database,
or after I ran it through iconv? Although iconv wasn't able to solve
this problem it did fix others in other tables; unfortunately I have
no way of knowing if it also mangled some data at the same time.
The version of iconv I have does know about UTF16 so I tried using
that as the "from" encoding instead of UTF8, but the result had new
errors in places where the original data was good, so that was
obviously a step backwards.
BTW, in case it matters I found out I misidentified the version of PG
this data came from - it's actually 7.3.6.
thanks,
janine
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2008-01-18 18:15:28 | Re: Forgot to dump old data before re-installing machine |
Previous Message | Tom Lane | 2008-01-18 17:55:34 | Re: Forgot to dump old data before re-installing machine |