From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Cc: | Tatsuo Ishii <ishii(at)postgreSQL(dot)org> |
Subject: | Errors in our encoding conversion tables |
Date: | 2015-11-26 20:30:31 |
Message-ID: | 11093.1448569831@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
There's a discussion over at
http://www.postgresql.org/message-id/flat/2sa(dot)Dhu5(dot)1hk1yrpTNFy(dot)1MLOlb(at)seznam(dot)cz
of an apparent error in our WIN1250 -> LATIN2 conversion. I looked into this
and found that indeed, the code will happily translate certain characters
for which there seems to be no justification. I made up a quick script
that would recompute the conversion tables in latin2_and_win1250.c from
the Unicode mapping files in src/backend/utils/mb/Unicode, and what it
computes is shown in the attached diff. (Zeroes in the tables indicate
codes with no translation, for which an error should be thrown.)
Having done that, I thought it would be a good idea to see if we had any
other conversion tables that weren't directly based on the Unicode data.
The only ones I could find were in cyrillic_and_mic.c, and those seem to
be absolutely filled with errors, to the point where I wonder if they were
made from the claimed encodings or some other ones. The attached patch
recomputes those from the Unicode data, too.
None of this data seems to have been touched since Tatsuo-san's original
commit 969e0246, so it looks like we simply didn't vet that submission
closely enough.
I have not attempted to reverify the files in utils/mb/Unicode against the
original Unicode Consortium data, but maybe we ought to do that before
taking any further steps here.
Anyway, what are we going to do about this? I'm concerned that simply
shoving in corrections may cause problems for users. Almost certainly,
we should not back-patch this kind of change.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
encoding-conversion-corrections.patch | text/x-diff | 9.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2015-11-26 20:39:42 | Re: New email address |
Previous Message | Greg Stark | 2015-11-26 20:12:27 | Re: New email address |