| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | pgsql-hackers(at)postgreSQL(dot)org |
| Cc: | Tatsuo Ishii <ishii(at)postgreSQL(dot)org> |
| Subject: | Re: Errors in our encoding conversion tables |
| Date: | 2015-11-28 20:24:22 |
| Message-ID: | 32464.1448742262@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I wrote:
> There's a discussion over at
> http://www.postgresql.org/message-id/flat/2sa(dot)Dhu5(dot)1hk1yrpTNFy(dot)1MLOlb(at)seznam(dot)cz
> of an apparent error in our WIN1250 -> LATIN2 conversion.
Attached is an updated patch (against today's HEAD) showing proposed
changes to bring cyrillic_and_mic.c and latin2_and_win1250.c into sync
with the Unicode Consortium's conversion data.
In addition, I've attached the C program I used to generate the proposed
new conversion tables from the Unicode/*.map files, a simple SQL script
to print out the conversion behavior for the affected conversions, and
a diff of the script's output between 9.5 and the proposed patch.
While the changes in the WIN1250 <-> LATIN2 conversions just amount to
removal of some translations that seem to have no basis in reality, the
changes in the Cyrillic mappings are quite a bit more extensive. It would
be good if we could get those checked by some native Russian speakers.
regards, tom lane
| Attachment | Content-Type | Size |
|---|---|---|
| encoding-conversion-corrections-2.patch | text/x-diff | 16.4 KB |
| buildmap.c | text/x-c | 3.2 KB |
| checkconv.sql | text/plain | 2.8 KB |
| diffs9.5vspatch | text/x-diff | 59.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Janes | 2015-11-28 20:51:58 | Re: Freeze avoidance of very large table. |
| Previous Message | Jeff Janes | 2015-11-28 20:17:25 | Re: Speed up Clog Access by increasing CLOG buffers |