From: | Patrice Hédé <patrice(at)idf(dot)net> |
---|---|
To: | Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr> |
Cc: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-odbc(at)postgresql(dot)org, Inoue(at)tpf(dot)co(dot)jp |
Subject: | Re: UTF-8 data migration problem in Postgresql 7.2 |
Date: | 2002-02-21 18:19:11 |
Message-ID: | 20020221181911.GB19184@idf.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-odbc |
Hi Jean-Michel,
I just started browsing this list again after a long absence...
* Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr> [020221 18:39]:
> 5) Surrogate pairs
> I heard PostgreSQL did not support surrogate pairs. Is this a problem of
> surrogate pair? Just my 0.02 cents, I know very little about UTF-8.
Surrogate pairs only exist in UTF-16. They are used to access
characters which are not on the BMP.
UTF-8 has a different way to encode these characters. Encoding
surrogates in UTF-8 is invalid and should be rejected by any
application receiving a UTF-8 stream (actually, they used to be just
irregular, but starting with Unicode 3.2, they will be illegal).
Regarding your sequence E3/82/27, it cannot be valid under any scheme.
UTF-8 is done in a way that any subsequent byte is equal or above
0x80. For E3 in particular, the 3rd byte has to be between 80 and BF.
Anyway "UTF-8 encoded surrogates" can only start with ED, so that's
not your problem here.
Hope this helps.
Patrice
--
Patrice Hédé
email: patrice hede à islande org
www : http://www.islande.org/
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2002-02-21 18:21:21 | Re: PostgreSQL 8.0 ?? |
Previous Message | Tom Lane | 2002-02-21 18:05:25 | Re: elog() proposal |
From | Date | Subject | |
---|---|---|---|
Next Message | Laurette Cisneros | 2002-02-21 19:28:50 | Re: time problem with postgres ODBC driver (fwd) |
Previous Message | Dave Page | 2002-02-21 14:13:06 | Re: ADO Max Records and Visual Basic |