Re: Collation version tracking for macOS

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-07 21:38:31
Message-ID: CAH2-WzkEhSsk8iy0ReQ+3ircHMPfaFtXq56bPyZWM0tQiW67HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 7, 2022 at 2:13 PM Jeremy Schneider
<schneider(at)ardentperf(dot)com> wrote:
> For my for my part, gut feeling is that MacOS major releases will be
> similar to any other OS major release, which may contain updates to
> collation algorithms and locales. ISTM like the same thing PG is looking
> for on other OS's to trigger the warning. But it might be good to get an
> official reference on MacOS, if someone knows where to find one? (I don't.)

I just don't think that we should be relying on a huge entity like
Apple or even glibc for this -- they don't share our priorities, and
there is no reason for this to change. The advantage of ICU versioning
is that it is just one library, that can coexist with others,
including other versions of ICU.

Imagine a world in which we support multiple ICU versions (for Debian
packages, say), some of which are getting quite old. Maybe we can
lobby for the platform to continue to support that old version of the
library -- there ought to be options. Lobbying Debian to stick with an
older version of glibc is another matter entirely. That has precisely
zero chance of ever succeeding, for reasons that are quite
understandable.

Half the problem here is to detect breaking changes, but the other
half is to not break anything in the first place. Or to give the user
plenty of opportunity to transition incrementally, without needing to
reindex everything at the same time. Obviously the only way that's
possible is by supporting multiple versions of ICU at the same time,
in the same database. This requires indirection that distinguishes
between "physical and logical" collation versions, where the same
nominal collation can have different implementations across multiple
ICU versions.

The rules for standards like BCP47 (the system that defines the name
of an ICU/CLDR locale) are deliberately very tolerant of what they
accept in order to ensure forwards and backwards compatibility in
environments where there isn't just one ICU/CLDR version [1] (most
environments in the world of distributed or web applications). So you
can expect the BCP47 name of a collation to more or less work on any
ICU version, perhaps with some loss of functionality (this is
unavoidable when you downgrade ICU to a version that doesn't have
whatever CLDR customization you might have relied on). It's very
intentionally a "best effort" approach, because throwing a "locale not
found" error message usually isn't helpful from the point of view of
the end user. Note that this is a broader standard than ICU or CLDR or
even Unicode.

[1] https://www.ietf.org/rfc/rfc6067.txt
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-06-07 22:27:02 Re: Collation version tracking for macOS
Previous Message Bruce Momjian 2022-06-07 21:38:03 Re: Collation version tracking for macOS