From: | Markus Bertheau <twanger(at)bluetwanger(dot)de> |
---|---|
To: | jm(at)poure(dot)com |
Cc: | Claudio Cicali <c(dot)cicali(at)mclink(dot)it>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Mixed UTF8 / Latin1 database |
Date: | 2004-04-18 08:51:16 |
Message-ID: | 1082278275.2058.8.camel@yarrow.bertheau.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
В Птн, 16.04.2004, в 16:26, Jean-Michel POURE пишет:
> > I'm wondering if anyone could have a script or something to help me
> > with this situation... :(
>
> Knowing that Unicode is composed of plain ASCII characters,
It's not, Unicode is just a number to glyph mapping, and UTF-8, what you
probably mean, uses the eighth bit too, which cannot be said of ASCII -
ASCII is a 7 bit character set.
> you may perform a conversion test using PHP "recode" function.
>
> You may test each record as follows:
>
> $test = recode ("latin1..u8", $record);
>
> If the $test value differs from the $record one, then it is a Latin1 string.
Unfortunately unless $record is all ASCII $test and $record will always
differ, because every byte stream is valid latin1 and can thus be
converted to UTF-8.
What you can do is to manually replace ä ü ö § and all other non-ASCII
byte values with their UTF-8 equivalent and then go through the data
manually to check it for correctness.
Another hint you can get at in your program is to try to check if your
input is valid UTF-8 and only convert from latin1 if it is not. You can
probably check for valid UTF-8 by seeing if the conversion from UTF-8 to
UTF-8 succeeds.
--
Markus Bertheau <twanger(at)bluetwanger(dot)de>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2004-04-18 12:09:32 | Re: [HACKERS] Remove MySQL Tools from Source? |
Previous Message | Shachar Shemesh | 2004-04-18 07:52:31 | Re: [HACKERS] Remove MySQL Tools from Source? |