From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collation version tracking for macOS |
Date: | 2022-11-24 02:07:43 |
Message-ID: | 346f836208a39009c1998ed5a41c7f1a0be36911.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2022-11-23 at 18:08 +1300, Thomas Munro wrote:
> (1) the default behaviour on failure to search would
> likely be to use the linked library instead and WARN about
> [dat]collversion mismatch, so far the same, and
Agreed.
> (2) the set of people
> who would really be prepared to compile their own copy of 67.X
> instead
> of downgrading or REFRESHing (with or without rebuilding) is
> vanishingly small.
The set of people prepared to do so is probably small. But the set of
people who will do it (prepared or not) when a problem comes up is
significantly larger ;-)
> 1. *Do* they change ucol_getVersion() values in minor releases? I
> tried to find a written policy on that.
It seems like a valid concern. The mere existence of a collation
version separate from the library major version seems to suggest that
it's possible. Perhaps they avoid it in most cases; but absent a
specific policy against it, the separate collation version seems to
allow them the freedom to do so.
> This speculation feels pretty useless. Maybe we should go and read
> the code or ask an ICU expert, but I'm not against making it
> theoretically possible to access two different minor versions at
> once,
> just to cover all the bases for future-proofing.
I don't think this should be an overriding concern that drives the
whole design. It is a nudge in favor of search-by-collversion.
> 2. Would package managers ever allow two minor versions to be
> installed at once? I highly doubt it;
Agreed.
I'm sure this has been discussed, but which distros even support
multiple major versions of ICU?
>
> 1. search-by-collversion: We introduce no new "library version"
> concept to COLLATION and DATABASE object and little or no new syntax.
> Whenever opening a collation or database, the system will search some
> candidate list of ICU libraries to try to find the one that agrees
> with [dat]collversion.
[...]
> The reason I prefer major[.minor] strings over whole library names is
> that we need to dlopen two of them so it's a little easier to build
> them from those parts than have to supply both names.
It also makes it easier to know which version suffixes to look for.
> The reason I
> prefer to keep allowing major-only versions to be listed is that it's
> good to have the option to just follow minor upgrades automatically.
Makes sense.
> Or I guess you could make something that can automatically search a
> whole directory (which directory?) to find all the suitably named
> libraries so you don't ever have to mention versions manually (if you
> want "apt-get install libicu72" to be enough with no GUC change
> needed) -- is that too weird?
That seems to go a little too far.
> SELECT * FROM pg_available_icu_libraries()
> SELECT * FROM pg_available_icu_collation_versions('en')
+1
> 2. lib-version-in-providers: We introduce a separate provider value
> for each ICU version, for example ICU63, plus an unversioned ICU like
> today.
I expressed interest in this approach before, but when you allowed ICU
compiled with --disable-renaming, that mitigated my concerns about when
to throw that error.
> 3. lib-version-in-attributes: We introduce daticuversion (alongside
> datcollversion) and collicuversion (alongside collversion).
I think this is the best among 2-4.
> 4. lib-version-in-locale: "63:en" from earlier versions. That was
> mostly a strawman proposal to avoid getting bogged down in
> syntax/catalogue/model change discussions while trying to prove that
> dlopen would even work. It doesn't sound like anyone really likes
> this.
I don't see any advantage of this over 3.
> 5. lib-version-in-collversion: We didn't explicitly discuss this
> before, but you hinted at it: we could just use u_getVersion() in
> [dat]collversion.
The advantage here is that it's very easy to tell the admin what
library the collation is looking for, but the disadvantages you point
out seem a lot worse: migration problems from v15, and the minor
version question.
I'd vote for 1 on the grounds that it's easier to document and
understand a single collation version, which comes straight from
ucol_getVersion(). This approach makes it a separate problem to find
the collation version among whatever libraries the admin can provide;
but adding some observability into the search should mitigate any
confusion.
Can you go over the advantages of approaches 2-4 again? Is it just a
concern about burdening the admin with finding the right ICU library
version for a given collation version? That's a valid concern, but I
don't think that should be an overriding design point. It seems more
important to model the collation versions properly.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2022-11-24 02:47:19 | Re: Hash index build performance tweak from sorting |
Previous Message | Thomas Munro | 2022-11-24 01:59:04 | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |