Re: Update Unicode data to Unicode 16.0.0

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Joe Conway <mail(at)joeconway(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Update Unicode data to Unicode 16.0.0
Date: 2025-01-21 01:06:29
Message-ID: 20250120170629.112c1bff@ardentperf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 20 Jan 2025 13:39:35 -0800
Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Fri, 2024-11-15 at 17:09 +0100, Peter Eisentraut wrote:
> > The practice of regularly updating the Unicode files is older than
> > the
> > builtin collation provider.  It is similar to updating the time
> > zone files, the encoding conversion files, the snowball files, etc.
> > We need
> > to move all of these things forward to keep up with the aspects of
> > the
> > real world that this data reflects.
>
> Should we consider bundling multiple versions of the generated tables
> (header files) along with Postgres?
>
> That would enable a compile-time option to build with an older version
> of Unicode if you want, solving the packager concern that Noah raised.
> It would also make it easier for people to coordinate the Postgres
> version of Unicode and the ICU version of Unicode.

FWIW, after adding ICU support I personally don't think there's a
pressing need to continue updating the tables anymore. I think ICU is
the best solution for people who need the latest linguistic collation
rules.

On the user side, my main concerns are the same as they've always
been: 100% confidence that Postgres updates will not corrupt any data
or cause incorrect query results, and not being forced to rebuild
everything (or logically copy data to avoid pg_upgrade). I'm at a large
company with many internal devs using Postgres in ways I don't know
about, and many users storing lots of unicode data I don't know about.

I'm working a fair bit with Docker and Kubernetes and CloudNativePG
now, so our builds come through the debian PGDG repo. Bundling multiple
tables doesn't bother me, as long as it's not a precursor to removing
current tables from the debian PGDG builds we consume in the future.

Ironically it's not really an issue yet for us on docker because
support for pg_upgrade is pretty limited at the moment. :) But I
think pg_upgrade support will rapidly improve in docker, and will
become common on large databases.

If Postgres does go the path of multiple tables, does the community
want to accumulate a new set of tables every year? That could add up
quickly. Maybe we don't add new tables every year, but follow the
examples of Oracle and DB2 in accumulating them on a less frequent
basis?

-Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2025-01-21 01:09:09 Re: Purpose of wal_init_zero
Previous Message Jacob Champion 2025-01-21 00:43:41 Re: [PoC] Federated Authn/z with OAUTHBEARER