Errors in our encoding conversion tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Tatsuo Ishii <ishii(at)postgreSQL(dot)org>
Subject: Errors in our encoding conversion tables
Date: 2015-11-26 20:30:31
Message-ID: 11093.1448569831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There's a discussion over at
http://www.postgresql.org/message-id/flat/2sa(dot)Dhu5(dot)1hk1yrpTNFy(dot)1MLOlb(at)seznam(dot)cz
of an apparent error in our WIN1250 -> LATIN2 conversion. I looked into this
and found that indeed, the code will happily translate certain characters
for which there seems to be no justification. I made up a quick script
that would recompute the conversion tables in latin2_and_win1250.c from
the Unicode mapping files in src/backend/utils/mb/Unicode, and what it
computes is shown in the attached diff. (Zeroes in the tables indicate
codes with no translation, for which an error should be thrown.)

Having done that, I thought it would be a good idea to see if we had any
other conversion tables that weren't directly based on the Unicode data.
The only ones I could find were in cyrillic_and_mic.c, and those seem to
be absolutely filled with errors, to the point where I wonder if they were
made from the claimed encodings or some other ones. The attached patch
recomputes those from the Unicode data, too.

None of this data seems to have been touched since Tatsuo-san's original
commit 969e0246, so it looks like we simply didn't vet that submission
closely enough.

I have not attempted to reverify the files in utils/mb/Unicode against the
original Unicode Consortium data, but maybe we ought to do that before
taking any further steps here.

Anyway, what are we going to do about this? I'm concerned that simply
shoving in corrections may cause problems for users. Almost certainly,
we should not back-patch this kind of change.

regards, tom lane

Attachment Content-Type Size
encoding-conversion-corrections.patch text/x-diff 9.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-11-26 20:39:42 Re: New email address
Previous Message Greg Stark 2015-11-26 20:12:27 Re: New email address