Quick Links

Re: Update Unicode data to Unicode 16.0.0

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Update Unicode data to Unicode 16.0.0
Date:	2025-03-15 16:15:36
Message-ID:	3481161.1742055336@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeremy Schneider <schneider(at)ardentperf(dot)com> writes:
> On Fri, 07 Mar 2025 13:11:18 -0800
> Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> The change in Unicode that I'm focusing on is the addition of U+A7DC,
>> which is unassigned in Unicode 15.1 and assigned in Unicode 16, which
>> lowercases to U+019B. The examples assume that the user is using
>> unassigned code points in PG17/Unicode15.1 and the PG_C_UTF8
>> collation.

> It seems the consensus is to update unicode in core... FWIW, I'm still
> in favor of leaving it alone because ICU is there for when I need
> up-to-date unicode versions.

> From my perspective, the whole point of the builtin collation was to
> one option that avoids these problems that come with updating both ICU
> and glibc.

I don't really buy this argument. If we sit on Unicode 15 until that
becomes untenable, which it will, then people will still be faced
with a behavioral change whenever we bow to reality and invent a
"builtin-2.0" or whatever collation. Moreover, by then they might
well have instances of the newly-assigned code points in their
database, making the changeover real and perhaps painful for them.

On the other hand, if we keep up with the Joneses by updating the
Unicode data, we can hopefully put those behavioral changes into
effect *before* they'd affect any real data. So it seems to me
that freezing our Unicode data is avoiding hypothetical pain now
at the price of certain pain later.

I compare this to our routine timezone data updates, which certainly
have not been without occasional pain ... but does anyone seriously
want to argue that we should still be running tzdata from 20 years
back? Or even 5 years back?

In fact, on the analogy of timezones, I think we should not only
adopt newly-published Unicode versions pretty quickly but push
them into released branches as well. Otherwise the benefit of
staying ahead of real use of the new code points isn't there
for end users.

regards, tom lane

In response to

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-15 06:54:41 from Jeremy Schneider

Responses

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-15 17:22:48 from Jeff Davis
Re: Update Unicode data to Unicode 16.0.0 at 2025-03-17 23:15:21 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nikita Malakhov	2025-03-15 16:21:31	Re: SQL/JSON json_table plan clause
Previous Message	Tomas Vondra	2025-03-15 15:50:02	Re: Changing the state of data checksums in a running cluster