Re: Update Unicode data to Unicode 16.0.0

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Update Unicode data to Unicode 16.0.0
Date: 2025-03-19 17:39:29
Message-ID: c5f5446753504ed9dfcb4ed1f822f1ba8c90d0ae.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-03-19 at 08:46 -0400, Robert Haas wrote:
> I see your point, but most people don't use the builtin collation
> provider.

The other providers aren't affected by us updating Unicode, so I think
we got off track somehow. I suppose what I meant was:

"If you are concerned about inconsistencies, and you move to the
builtin provider, then 99% of the inconsistency problem is gone. We can
remove the last 1% of the problem if we do all the work listed above."

> When an EDB customer asks "if I do X,
> will anything break," it's often the case that answering "maybe" is
> the same as answering "yes".

That's a good point. However, note that "doesn't break primary keys" is
a nice guarantee, even if there's still some remaining doubts about
expression indexes, etc.

> They want a hard guarantee that the behavior will not
> change.

My understanding of this thread so far was that we were mostly
concerned about internal inconsistencies of stored structures; e.g.
indexes that could return different results than a seqscan.

Not changing query results at all between major versions is a valid
concern, but a fairly strict one that doesn't seem limited to immutable
functions or collation issues. Surely, at least the results of "SELECT
version()" should change from release to release ;-)

> Again, I'm not trying to oblige
> you to deliver that behavior and I confess to ignorance on how we
> could realistically get there.

FWIW I'm not complaining about doing the work. But I think the results
will be better if we can get a few people aligned on a general plan and
collaborating. I will try to kick that off.

> and to be able to easily know exactly what they need to reindex.

That's the main one, I think. The upgrade check offers that for the
builtin provider, though admittedly it's not a very user-friendly
solution, and we can do better.

> And from that point of view -- and again, I'm not volunteering to
> implement it and I'm not telling you to do it either -- Joe's
> proposal
> of supporting multiple versions sounds fantastic.

I certainly don't oppose giving users that choice. But I view it as a
burden we are placing on the users -- better than breakage, but not
really great, either. So if we do put in a ton of work, I'd like it if
we could arrive at a bettter destination.

If we actually want the BEST user experience possible, they'd not even
really know that their index was ever inconsistent. Autovacuum would
come along and just find the few entries in the index that need fixing,
and reindex just those few tuples. In theory, it should be possible:
there are a finite number of codepoints that change each Unicode
version, and we can just search for them in the data and fix up derived
structures.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-03-19 17:41:36 Re: making EXPLAIN extensible
Previous Message David G. Johnston 2025-03-19 17:35:55 Re: Orphaned users in PG16 and above can only be managed by Superusers