pgsql: Update display widths as part of updating Unicode

From: John Naylor <john(dot)naylor(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Update display widths as part of updating Unicode
Date: 2021-08-26 15:06:00
Message-ID: E1mJGx6-0002Xn-4v@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Update display widths as part of updating Unicode

The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era. This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.

To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.

In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.

Jacob Champion

Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg(at)mail(dot)gmail(dot)com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/bab982161e0590746a2fd2a03043b27108b23ac6

Modified Files
--------------
src/common/unicode/.gitignore | 1 +
src/common/unicode/Makefile | 9 +-
.../generate-unicode_east_asian_fw_table.pl | 76 +++++++++++++
src/common/wchar.c | 41 +++----
src/include/common/unicode_east_asian_fw_table.h | 120 +++++++++++++++++++++
5 files changed, 220 insertions(+), 27 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message John Naylor 2021-08-26 17:16:50 pgsql: Extend collection of Unicode combining characters to beyond the
Previous Message John Naylor 2021-08-26 14:07:43 pgsql: Revert "Rename unicode_combining_table to unicode_width_table"