Re: encoding affects ICU regex character classification

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: encoding affects ICU regex character classification
Date: 2023-12-16 01:23:53
Message-ID: CA+hUKGJNnoTLX76UdKbW7UECbh-DSjHOqFbU-agZbXA7TjmL+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 16, 2023 at 1:48 PM Jeremy Schneider
<schneider(at)ardentperf(dot)com> wrote:
> On 12/14/23 7:12 AM, Jeff Davis wrote:
> > The concern over unassigned code points is misplaced. The application
> > may be aware of newly-assigned code points, and there's no way they
> > will be mapped correctly in Postgres if the provider is not aware of
> > those code points. The user can either proceed in using unassigned code
> > points and accept the risk of future changes, or wait for the provider
> > to be upgraded.
>
> This does not seem to me like a good way to view the situation.
>
> Earlier this summer, a day or two after writing a document, I was
> completely surprised to open it on my work computer and see "unknown
> character" boxes. When I had previously written the document on my home
> computer and when I had viewed it from my cell phone, everything was
> fine. Apple does a very good job of always keeping iPhones and MacOS
> versions up-to-date with the latest versions of Unicode and latest
> characters. iPhone keyboards make it very easy to access any character.
> Emojis are the canonical example here. My work computer was one major
> version of MacOS behind my home computer.

That "SQUARE ERA NAME REIWA" codepoint we talked about in one of the
multi-version ICU threads was an interesting case study. It's not an
emoji, it entered real/serious use suddenly, landed in a quickly
wrapped minor release of Unicode, and then arrived in locale
definitions via regular package upgrades on various OSes AFAICT (ie
didn't require a major version upgrade of the OS).

https://en.wikipedia.org/wiki/Reiwa_era#Announcement
https://en.wikipedia.org/wiki/Reiwa_era#Technology
https://unicode.org/versions/Unicode12.1.0/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-12-16 02:25:58 Clang optimiser vs preproc.c
Previous Message Japin Li 2023-12-16 00:58:53 Re: Transaction timeout