Quick Links

Re: Update Unicode data to Unicode 16.0.0

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Joe Conway <mail(at)joeconway(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Update Unicode data to Unicode 16.0.0
Date:	2025-03-20 12:45:42
Message-ID:	CA+TgmoZ7riCiacKzQmq=82Fu7B74A9MAAKqUmuv8BEeWHZMhTA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 19, 2025 at 5:47 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Do you have a sketch of what the ideal Unicode version management
> experience might look like? Very high level, like "this is what happens
> by default during an upgrade" and "this is how a user discovers that
> that they might want to update Uniocde", etc.
>
> What ways can/should we nudge users to update more quickly, if at all,
> so that they are less likely to have problems with newly-assigned code
> points?
>
> And, if possible, how we might extend this user experience to libc or
> ICU updates?

As I think you know, I don't consider myself an expert in this area,
just somebody who has seen a decent amount of user pain (although I am
sure that even there some other people have seen more). That said, for
me the ideal would probably include the following things:

* When the collation/ctype/whatever definitions upon which you are
relying change, you can either decide to switch to the new ones
without rebuilding your indexes and risk wrong results until you
reindex, or you can decide to create new indexes using the new
definitions and drop the old ones.

* You're never forced to adopt new definitions during a SPECIFIC major
or minor release upgrade or when making some other big change to the
system. It's fine, IMHO, if we eventually remove support for old
stuff, but there should be a multi-year window of overlap. For
example, if PostgreSQL 42 adds support for Unicode 95.0.0, we'd keep
that support for, I don't know, at least the next four or five major
versions. So upgrading PG can eventually force you to upgrade
collation defs, but you don't get into a situation where PG 41
supports only Unicode < 95 and PG 42 supports only Unicode >= 95.

* In an absolutely perfect world, we'd have strong versioning of every
type of collation from every provider. This is probably very difficult
to achieve in practice, so maybe the somewhat more realistic goal
might be to get to a point where most users, most of the time, are
relying on collations with strong versioning. For glibc, this seems
relatively hopeless unless upstream changes their policy in a big way.
For ICU, loading multiple library versions seems like a possible path
forward. Relying more on built-in collations seems like another
possible approach, but I think that would require us to have more than
just a code-point sort: we'd need to have built-in collations for
users of various languages. That sounds like it would be a lot of work
to develop, but even worse, it sounds like it would be a tremendous
amount of work to maintain. I expect Tom will opine that this is an
absolutely terrible idea that we should never do under any
circumstances, and I understand the sentiment, but I think it might be
worth considering if we're confident we will have people to do the
maintenance over the long term.

* I would imagine pg_upgrade either keeping the behavior unchanged for
any strongly-versioned collation, or failing. I don't see a strong
need to try to notify users about the availability of new versions
otherwise. People who want to stay current will probably figure out
how to do that, and people who don't will ignore any warnings we give
them. I'm not completely opposed to some other form of notification,
but I think it's OK if "we finally removed support for your extremely
old ICU version" is the driving force that makes people upgrade.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-19 21:47:44 from Jeff Davis

Responses

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-21 06:45:10 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-03-20 13:06:20	Support "make check" for PGXS extensions
Previous Message	Ajin Cherian	2025-03-20 12:38:53	Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding