Re: Collation version tracking for macOS

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-11-24 02:07:43
Message-ID: 346f836208a39009c1998ed5a41c7f1a0be36911.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2022-11-23 at 18:08 +1300, Thomas Munro wrote:

> (1) the default behaviour on failure to search would
> likely be to use the linked library instead and WARN about
> [dat]collversion mismatch, so far the same, and 

Agreed.

> (2) the set of people
> who would really be prepared to compile their own copy of 67.X
> instead
> of downgrading or REFRESHing (with or without rebuilding) is
> vanishingly small.

The set of people prepared to do so is probably small. But the set of
people who will do it (prepared or not) when a problem comes up is
significantly larger ;-)

> 1.  *Do* they change ucol_getVersion() values in minor releases?  I
> tried to find a written policy on that.

It seems like a valid concern. The mere existence of a collation
version separate from the library major version seems to suggest that
it's possible. Perhaps they avoid it in most cases; but absent a
specific policy against it, the separate collation version seems to
allow them the freedom to do so.

> This speculation feels pretty useless.  Maybe we should go and read
> the code or ask an ICU expert, but I'm not against making it
> theoretically possible to access two different minor versions at
> once,
> just to cover all the bases for future-proofing.

I don't think this should be an overriding concern that drives the
whole design. It is a nudge in favor of search-by-collversion.

> 2.  Would package managers ever allow two minor versions to be
> installed at once?  I highly doubt it; 

Agreed.

I'm sure this has been discussed, but which distros even support
multiple major versions of ICU?

>
> 1.  search-by-collversion:  We introduce no new "library version"
> concept to COLLATION and DATABASE object and little or no new syntax.
> Whenever opening a collation or database, the system will search some
> candidate list of ICU libraries to try to find the one that agrees
> with [dat]collversion.

[...]

> The reason I prefer major[.minor] strings over whole library names is
> that we need to dlopen two of them so it's a little easier to build
> them from those parts than have to supply both names.

It also makes it easier to know which version suffixes to look for.

>   The reason I
> prefer to keep allowing major-only versions to be listed is that it's
> good to have the option to just follow minor upgrades automatically.

Makes sense.

> Or I guess you could make something that can automatically search a
> whole directory (which directory?) to find all the suitably named
> libraries so you don't ever have to mention versions manually (if you
> want "apt-get install libicu72" to be enough with no GUC change
> needed) -- is that too weird?

That seems to go a little too far.

>   SELECT * FROM pg_available_icu_libraries()
>   SELECT * FROM pg_available_icu_collation_versions('en')

+1

> 2.  lib-version-in-providers: We introduce a separate provider value
> for each ICU version, for example ICU63, plus an unversioned ICU like
> today.

I expressed interest in this approach before, but when you allowed ICU
compiled with --disable-renaming, that mitigated my concerns about when
to throw that error.

> 3.  lib-version-in-attributes: We introduce daticuversion (alongside
> datcollversion) and collicuversion (alongside collversion).

I think this is the best among 2-4.

> 4.  lib-version-in-locale:  "63:en" from earlier versions.  That was
> mostly a strawman proposal to avoid getting bogged down in
> syntax/catalogue/model change discussions while trying to prove that
> dlopen would even work.  It doesn't sound like anyone really likes
> this.

I don't see any advantage of this over 3.

> 5.  lib-version-in-collversion:  We didn't explicitly discuss this
> before, but you hinted at it: we could just use u_getVersion() in
> [dat]collversion.

The advantage here is that it's very easy to tell the admin what
library the collation is looking for, but the disadvantages you point
out seem a lot worse: migration problems from v15, and the minor
version question.

I'd vote for 1 on the grounds that it's easier to document and
understand a single collation version, which comes straight from
ucol_getVersion(). This approach makes it a separate problem to find
the collation version among whatever libraries the admin can provide;
but adding some observability into the search should mitigate any
confusion.

Can you go over the advantages of approaches 2-4 again? Is it just a
concern about burdening the admin with finding the right ICU library
version for a given collation version? That's a valid concern, but I
don't think that should be an overriding design point. It seems more
important to model the collation versions properly.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-11-24 02:47:19 Re: Hash index build performance tweak from sorting
Previous Message Thomas Munro 2022-11-24 01:59:04 Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"