Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Verite <daniel(at)manitou-mail(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-23 20:36:12
Message-ID: ed6cc199-cfb6-4feb-9439-4451a4ee0520@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.07.24 19:55, Robert Haas wrote:
> Every other piece of software in the world has to deal with changes as
> a result of the addition of new code points, and probably less
> commonly, revisions to existing code points. Presumably, their stuff
> breaks too, from time to time. I mean, I find it a bit difficult to
> believe that web browsers or messaging applications on phones only
> ever display emoji, and never try to do any sort of string sorting.

The sorting isn't the problem. We have a versioning mechanism for
collations. What we do with the version information is clearly not
perfect yet, but the mechanism exists and you can hack together queries
that answer the question, did anything change here that would affect my
indexes. And you could build more tooling around that and so on.

The problem being considered here are updates to Unicode itself, as
distinct from the collation tables. A Unicode update can impact at
least two things:

- Code points that were previously unassigned are now assigned. That's
obviously a very common thing with every Unicode update. The new
character will have new properties attached to it, so the result of
various functions that use such properties (upper(), lower(),
normalize(), etc.) could change, because previously the code point had
no properties, and so those functions would not do anything interesting
with the character.

- Certain properties of an existing character can change. Like, a
character used to be a letter and now it's a digit. (This is an
example; I'm not sure if that particular change would be allowed.) In
the extreme case, this could have the same impact as the above, but in
practice the kinds of changes that are allowed wouldn't affect typical
indexes.

I don't think this has anything in particular to do with the new builtin
collation provider. That is just one new consumer of this.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-23 20:39:23 Re: [18] Policy on IMMUTABLE functions and Unicode updates
Previous Message Daniel Verite 2024-07-23 20:34:00 Re: [18] Policy on IMMUTABLE functions and Unicode updates