Quick Links

Re: Update Unicode data to Unicode 16.0.0

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Joe Conway <mail(at)joeconway(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Update Unicode data to Unicode 16.0.0
Date:	2025-03-21 06:45:10
Message-ID:	74befde9b3112aa91a4394b2af22c27f39a00ad5.camel@j-davis.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2025-03-20 at 08:45 -0400, Robert Haas wrote:
> * When the collation/ctype/whatever definitions upon which you are
> relying change, you can either decide to switch to the new ones
> without rebuilding your indexes and risk wrong results until you
> reindex, or you can decide to create new indexes using the new
> definitions and drop the old ones.

Would newly-created objects pick up the new Unicode version, or stick
with the old one?

> Relying more on built-in collations seems like another
> possible approach, but I think that would require us to have more
> than
> just a code-point sort: we'd need to have built-in collations for
> users of various languages. That sounds like it would be a lot of
> work
> to develop, but even worse, it sounds like it would be a tremendous
> amount of work to maintain. I expect Tom will opine that this is an
> absolutely terrible idea that we should never do under any
> circumstances, and I understand the sentiment, but I think it might
> be
> worth considering if we're confident we will have people to do the
> maintenance over the long term.

Supporting a built-in case-insensitive collation would be some work,
but it's not a huge leap now that we have CASEFOLD().

Supprting built-in natural language sort orders would be a much larger
scope. And I don't think we need that, but that's a larger discussion.

> I don't see a strong
> need to try to notify users about the availability of new versions
> otherwise. People who want to stay current will probably figure out
> how to do that

What if we were able to tell, for instance, that your database has none
of the codepoints affected by the most recent update. Then updating
would be less risky than not updating: if you don't update Unicode,
then the code points could end up in the database treated as
unassigned, and then cause a problem for future updates.

Regards,
Jeff Davis

In response to

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-20 12:45:42 from Robert Haas

Responses

Re: Update Unicode data to Unicode 16.0.0 at 2025-03-21 14:45:47 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2025-03-21 07:33:41	Re: Proposal - Allow extensions to set a Plan Identifier
Previous Message	vignesh C	2025-03-21 06:34:46	Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row