Quick Links

Re: Errors in our encoding conversion tables

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Cc:	Tatsuo Ishii <ishii(at)postgreSQL(dot)org>
Subject:	Re: Errors in our encoding conversion tables
Date:	2015-11-28 20:24:22
Message-ID:	32464.1448742262@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> There's a discussion over at
> http://www.postgresql.org/message-id/flat/2sa(dot)Dhu5(dot)1hk1yrpTNFy(dot)1MLOlb(at)seznam(dot)cz
> of an apparent error in our WIN1250 -> LATIN2 conversion.

Attached is an updated patch (against today's HEAD) showing proposed
changes to bring cyrillic_and_mic.c and latin2_and_win1250.c into sync
with the Unicode Consortium's conversion data.

In addition, I've attached the C program I used to generate the proposed
new conversion tables from the Unicode/*.map files, a simple SQL script
to print out the conversion behavior for the affected conversions, and
a diff of the script's output between 9.5 and the proposed patch.

While the changes in the WIN1250 <-> LATIN2 conversions just amount to
removal of some translations that seem to have no basis in reality, the
changes in the Cyrillic mappings are quite a bit more extensive. It would
be good if we could get those checked by some native Russian speakers.

regards, tom lane

Attachment	Content-Type	Size
encoding-conversion-corrections-2.patch	text/x-diff	16.4 KB
buildmap.c	text/x-c	3.2 KB
checkconv.sql	text/plain	2.8 KB
diffs9.5vspatch	text/x-diff	59.7 KB

In response to

Errors in our encoding conversion tables at 2015-11-26 20:30:31 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2015-11-28 20:51:58	Re: Freeze avoidance of very large table.
Previous Message	Jeff Janes	2015-11-28 20:17:25	Re: Speed up Clog Access by increasing CLOG buffers