Re: Collation version tracking for macOS

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jim Nasby <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-09 17:54:01
Message-ID: EF3786A4-4254-456F-A9B8-C12C4AF3156C@ardentperf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Jun 8, 2022, at 22:40, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Jun 8, 2022 at 10:24 PM Jeremy Schneider
> <schneider(at)ardentperf(dot)com> wrote:
>> Even if PG supports two versions of ICU, how does someone actually go about removing every dependency on the old version and replacing it with the new?
>
> They simply REINDEX, without changing anything. The details are still
> fuzzy, but at least that's what I was thinking of.
>
>> Can it be done without downtime? Can it be done without modifying a running application?
>
> Clearly the only way that we can ever transition to a new "physical
> collation" is by reindexing using a newer ICU version. And clearly
> there is going to be a need to fully deprecate any legacy version of
> ICU on a long enough timeline. There is just no getting around that.

I’m probably just going to end up rehashing the old threads I haven’t read yet…

One challenge with this approach is you have things like sort-merge joins that require the same collation across multiple objects. So I think you’d need to keep all the old indexes around until you have new indexes available for all objects in a database, and somehow the planner would need to be smart enough to dynamically figure out old vs new versions on a query-by-query basis. May need an atomic database-wide cutover; running a DB with internally mixed collation versions doesn’t seem like a small challenge. It would require enough disk space for two copies of all indexes, and queries would change which indexes they use in a way that wouldn’t be immediately obvious to users or app dev. Suddenly switching to or from a differently-bloated index could result in confusing and sudden performance changes.

Also there would still need to be a plan to address all the other non-index objects where collation is used, as has been mentioned before.

And given the current architecture, that final “alter database update default collation” command still seems awful risky, bug-prone and difficult to get correct. At least it seems that way to me.

At a minimum, this is a very big project and it seems to me like it may be wise to get more end-to-end fleshing out of the plans before committing incremental pieces in core (which could end up being misguided if the plan doesn’t work as well as assumed). Definitely doesn’t seem to me like anything that will happen in a year or two.

And my opinion is that the problems caused by depending on OS libraries for collation need to be addressed on a shorter timeline than what’s realistic for inventing a new way for a relational database to offer transparent or online upgrades of linguistic collation versions.

Also I still think folks are overcomplicating this by focusing on linguistic collation as the solution. Like 1% of users actually need or care about having the latest technically correct local-language-based sorting, at a database level. MySQL did the right thing here by doing what every other RDBMS did, and just making a simple “good-enough” collation hardcoded in the DB, same across all platforms, that never changes.

The 1% of users who need true linguistic collation can probably deal with the trade-off of dump-and-load upgrades for their ICU indexes and databases for a few more years.

-Jeremy

Sent from my TI-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-06-09 18:20:01 Re: BTMaxItemSize seems to be subtly incorrect
Previous Message Peter Geoghegan 2022-06-09 17:39:17 Re: BTMaxItemSize seems to be subtly incorrect